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The Weather and Other Factors Influencing Employee 
Punctuality * 


Roland E. Mueser 
The Pennsylvania State College 


On a beautiful warm unseasonal day in the 
middle of February 1951, the majority of 
usually sleepy students arrived bright and 
early for an 8:00 o’clock college class. Such 
promptness at this hour was as unusual as 
the spring day in February. The coincidence 
invited the comparison of weather and at- 
tendance. Did the early morning brightness 
stimulate these otherwise uninspired students? 
The hypothesis suggested itself: Increased 
light intensity might be causing early awaken- 
ing, or it might hasten the morning routine of 
washing, dressing, and breakfasting. This 
study was undertaken to determine the corre- 
lation between early morning illumination and 
one indicator of human activity. Promptness 
in reporting to work was used as a criterion 
which might be accurately measured on a 
statistically significant population. 

The personnel of an engineering research 
laboratory on the campus was chosen because 
attendance figures could be readily obtained. 
Only those employees who were scheduled to 
start work at 8:00 a.m. were selected and a 
standardized list was used for holding a con- 
stant sample. It was, however, impossible to 
rigidly limit the sample to the identical group 
for many practical reasons. Part of the em- 
ployees were necessarily absent on business, 
vacation, or because of illness during intervals 
in the recording period. Eliminating their 
records was a prohibitive statistical task and 


*The author wishes to thank Dr. William M. 
Lepley for his advice and aid. In particular, it was 
Dr. Lepley’s classroom observation which was re- 
sponsible for undertaking this project. In addition, 
the cooperation of the members of the Ordnance 
Research Laboratory administration and Meteorology 
Department is sincerely appreciated 


would also result in a drastically reduced 
population size. Actually a total of 144 in- 
dividuals were on the standardized list. Of 
these, an average of 132.8 or 92.2% were at 
work in the Laboratory during the test period. 
By extending the study over a number of 
months a gross averaging effect has been 
achieved and should tend to minimize chance 
errors. 


Procedure 


Employees of the Laboratory were checked 
in by guards at the gate and the time recorded 
to the nearest five minute interval. An aver- 
age of 101.3 men and 31.5 women were timed 
six days a week from February 23, 1951 
through May 14, 1951, a total of 69 working 
days. Data were recorded independently for 
men and women to allow for later comparison. 
The majority of employees drive to work in 
private automobiles and the remainder all 
walk. No public conveyance is employed for 
transportation, hence patterns due to stand- 
ardized bus or train schedules are avoided. 
Similarly chance pattern disruptions, as might 
be due to a commuter train arriving late, are 
not present. Individual chance factors can- 
not, of course, be avoided. However, since 
an incident such as a flat tire affects no more 
than a single car pool, no large error is intro- 
duced by a single transportation mishap. The 
average distance traveled to work was 3.8 
miles for employees, with approximately 66% 
living in State Colle ze where the drive or walk 
to work is less than a mile. Although the only 
primary division of the population is sex, in- 
herently this tends to produce strong second- 
ary selectivity. The women of the Labora- 
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tory are mainly secretaries, clerks, typists, 
and a few technicians. The female employees 
are, therefore, an exclusively non-supervisory 
group. The group of 101 male employees is 
composed 60% of research scientists and ad- 
ministrators. Most of the remainder are made 
up of machine shop and male technical em- 
ployees such as draftsmen and scientific as- 
sistants. 

The weather data were obtained from the 
Meteorological Department of the College. 
The most important information for this 
study, a measure of light intensity, was ob- 
tained from the department’s Eppley pyr- 
heliometer. Readings were taken from the 
record of light intensity at half-hour inter- 
vals from 6:00 through 8:00 a.m. and a total 
of these values was used as a measure of the 
light intensity for the early morning. Ap- 
plying these figures to all employees does, of 
course, involve the assumption that the gen- 
eral atmospheric conditions are the same at 
all homes as at the college. The closeness of 


most residences to the Laboratory and the 
averaging effect of a large sample would be 
expected to reduce errors due to this assump- 


tion. 

Nine other meteorological variables were 
observed at 7:00 a.m. These were included 
in the study in order that they might be con- 
sidered as secondary influences on punctuality 
behavior. 

There is a question as to how early and how 
late arrival times are to be tabulated to obtain 
an average figure which is a sensitive indica- 
tor of deviations which atmospheric conditions 
might be expected to introduce. Although 
there is no obvious error in including all em- 
ployees who arrived early, extreme lateness 
would seem due, a disproportionate fraction 
of the time, to purely chance factors rather 
than the interplay of a subtle meteorological 
influence. The flat tire, sick child, morning 
shopping trip, inoperative alarm clock,.or the 
morning return from a business trip all intro- 
duce delays which would overshadow the 
effects being sought. A subsequent study of 
employees arriving very late verified the fact 
that these variations occur randomly. It was 
decided, therefore, to study most intensively 
the average arrival time of employees arriv- 
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ing at work less than 22.5 minutes late. In 
order that this group be balanced a similar 
limit was placed on early arrivals so that em- 
ployees arriving after 7:37.5 a.m. but before 
8:22.5 a.m. were studied. Eighty-six per- 
cent of all men and 97 percent of all women 
arrived between these times. The average 
arrival times were calculated independently 
for each sex. 

Conjecture would lead one to believe that 
the very early arrivals—i.e. those coming be- 
fore 7:37 a.m.—would be extremely sensitive 
to meteorological influence since their attend- 
ance is not so keenly forced by the conform- 
ity-producing 8:00 a.m. deadline. Further- 
more, as a group they might be expected to 
exhibit greater individuality. In other words, 
the early birds are more nearly free to do as 
they please and so should react markedly to 
any atmospheric condition which tends tq pro- 
duce stimulated or sluggish behavior. Be- 
cause of this, arrival times of these early em- 
ployees were also studied as an independent 
group. Since no women regularly came to 
work before 7:37 a.m., this computation was 
only possible for men. 


Distribution of Arrival Times 


The distribution of arrival times for men 
and women was computed for the period 
March 9 to May 14. Figure 1 shows the dis- 
tribution of average arrival times of 32 women 
and 101 men. 

The distribution is similar to that obtained 
by F. H. Allport (1) called a J curve because 
of the decrement characteristic of arrival 
times before the 8:00 a.m. deadline. The 
drop-off after 8:00 o'clock is also in agreement 
with earlier data, being steeper and in the 
shape of a reversed J. The greater steepness 
on the right side is expected since individuals 
arriving late are exhibiting non-conforming 
sehavior. The conjecture seems reasonable 
that if the time of arrival were stipulated as 
8:00 a.m. but no social or economic pressure 

1 Since the raw data of the study were grouped in 
five minute intervals all class divisions are on the 
half minute and the statistics have been computed 
on this basis. However, to avoid the awkward 
dangling .5, intervals are quoted here to the minute 


and the additional fraction is to be understood. In 
this case 7:37 and 8:22. 
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forcing conformity was applied, the resulting 
distribution might be a Gaussian or normal 
curve. Conversely as the factors to produce 
conformity—i.e., to get to work on time—be- 


come more compulsive, not only will the curve 
be displaced to the left (as Allport points out) 
but the skewness of the distribution should 
become accentuated. For the extreme op- 
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posite of “free-will” attendance there would 
be a situation where the punishment for late- 
ness was so severe that a tardy employee 
would stay absent rather than risk lateness. 
Such a situation is not as fantastic as it 
sounds, for it is close to the actual circum- 
stance where being late in catching an im- 
portant plane or train is as bad as not ar- 
riving at all. 

In the distribution illustrated here there is 
a marked difference in the average behavior 
of men and women. Whereas an average of 
about 8% of the men arrive extremely early, 
before 7:37, only 0.1% of the women come 
during this period. Similarly, more than twice 
as Many men come extremely late, after 8:22, 
as women. Practically no female employee 
arrived at work later than 9:20 a.m., yet a 
few men drifted in every morning as late as 
10:45 a.m. Beyond this point it is a moot 
question whether it is tardiness or half day 
absenteeism which is occurring. 

The tendency for the attendance charac- 
teristics of men to be more widely distributed 
than women is believed due to the high pro- 
portion of male research and professional 
workers. Research is traditionally an occupa- 
tion catering to individualism and personal 
work habits. Even in a highly structured 
situation where it is generally expected that 
regular hours will be kept, some of the tech- 
nical employees do not take attendance rules 
too literally. The tendency towards pro- 
nounced earliness would seem to be due to 
the greater personal job interest. This is an 
understandable manifestation among men 
where working is a career. Almost all the 
women employees are performing less stimu- 
lating service jobs and few expect to work 
longer than a few years. 

The time for men and women coming be- 
tween 7:37 and 8:22 was averaged for each 
working day in the test period. If the time 
which these employees arrive at work is being 
influenced by a common factor, the mean ar- 
rival times of the two sexes should have a 
positive correlation. Indeed, such a test bears 
a resemblance to split-half methods of com- 
puting test reliability. Actually the average 
arrival times of men and women have a cor- 
relation coefficient of .43 with a level of sig- 
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nificance of 0.02%. Although the two groups 
vary in a similar manner from day to day the 
mean arrival time for men, 7:55.52 a.m., is 
2.6 minutes earlier than that for the women. 

It would be expected that precipitation 
would tend to make employees tardy since the 
majority drive to work, roads become slippery 
under these circumstances, and visibility is 
impaired. However, a comparison of rainy 
mornings with dry ones failed to reveal any 
significant difference due to this factor (see 
Table 1). 


Table 1 


Effect of Rain on Mean Arrival Time of Employees 
Arriving Between 7:37 and 8:22 a.m. 


Fair Weather Precipitation 
(25 days) 


101 Men 7:55. 
32 Women 7:58. 
True Average 13S 


Since it is difficult to imagine that on the 
average no impairment of driving conditions 
existed due to precipitation, it appears that 
employees foresightedly take bad driving con- 
ditions into account and tend to compensate 
for the circumstance. It is also possible that 
some phenomenon adjunctive to rain has a 
reverse effect and tends to produce early at- 
tendance. Later results lend credence to this 
hypothesis. 


Weekly Cycle of Punctuality 


The popularity of the expression “blue 
Monday” and a general totaling of introspec- 
tive reports following any weekend would lead 
one to believe that the day of the week might 
have an influence on work attendance figures. 
A plot of the mean arrival time for the em- 
ployees as a function of day of the week is 
given in Figure 2. It can be seen that there 
is agreement between the two groups with the 
exception of a let-down among men on Wed- 
nesdays. If punctuality can be considered a 
criterion of general feeling tone it is apparent 
that Monday is indeed “blue,” people are 
hitting their stride by midweek, and tend to 
be more tardy as the weekend approaches. 
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A breakdown of all employees into time 
arrival groups per day illustrates a cross-pat- 
tern in punctuality habits as is shown in Fig- 
ure 3. Fewer employees arrive Just on Time 
and more Late as the week progresses and 
people coming far ahead of time follow a dif- 
ferent pattern from those arriving just a few 
minutes early. The curves are primarily of 
interest because they illustrate that the shape 
of the arrival distribution given in Figure 1 
will vary somewhat as a function of the day 
of the week. 


Weekly cycle of average arrival time of people coming between 7:37 and 8.22 am 


Meteorological Effects 


A figure representing early morning bright- 
ness was obtained by measuring the light in- 
tensity recordings of an Eppley pyrheliometer 
at half hour intervals from dawn until 8:00 
a.m. The sum of these values “J” gives a 
rough integration of total morning brightness. 
Over the February to May period covered by 
the study the average value of J gradually in- 
creased. However, there was no significant 
change in employee arrival times indicating 
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Fic. 3. Weekly cycle of punctuality for men and women. 


adaptation to the seasonal light change. In phenomenon, light values have been consid- 
general, however, the day-to-day fluctuations ered logarithmically (2). 

were far greater than the seasonal change, see The correlation between the average daily 
Figure 4. Because of the wide range of values, arrival times and light intensity, 10 log I, is 
and because psychophysical brightness dis- given in Table 2. 

crimination is a relative rather than absolute In general the correlations between light 
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and promptness are significant at about the 
10% level but surprisingly in an inverse man- 
ner from that originally expected. Thus on 
the average both men and women arrive at 
work significantly earlier when the morning 
is dull and later when it is bright. The daily 
mean arrival times for men were grouped into 
thirds and those for women into fifths. From 
Figure 5 it is apparent that the reaction was 
similar in both men and women. The average 


DAYLIGHT 
SAVING TIME 





1951 
Wt 


APRIL 


Daily light intensity from dawn to 8:00 a.m. 


arrival time of women fluctuated over a much 
wider interval than that of the men. The 
men who arrived Very Early reacted in a 
contrary manner to the morning light stimuli 
just as they did with respect to the weekly 
cycle in Figure 3. 

The data were also examined to determine 
whether any of the following meteorological 
conditions might be a factor influencing the 
average arrival times: (1) Corrected baro- 
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metric pressure at 7:00 a.m.; (2) Barometric 
tendency in last 20 hours with respect to di- 
rection and magnitude; (3) Amount of baro- 
metric fluctuation, i.e. atmospheric pressure 
roughness existing in the previous 20 hours 
(roughness was defined as pressure variations 
lasting less than an hour and including tend- 
ency reversals); and (4) Change in light in- 
tensity in the preceding 24 hours (both direc- 
tion of change, and curve steepness were con- 
sidered). 


Effect of light intensity on average time of arrival at work. 


Mills (5), Winslow (6), and others have 
noted various psychological effects of baro- 
metric pressure on human beings. However, 
none of the listed factors showed a significant 
correlation with the punctuality criterion. 
The highest correlation coefficient obtained 
was + .12 between the barometric pressure 
and the average arrival time of women. A 
slight correlation in this direction would be 
expected as arising from the fact that dull 
overcast mornings are more common when the 





The Weather and Other Factors Influencing Employee Punctuality 


Table 2 


Correlation of Employee Punctuality and 
Early Morning Light Intensity 


Morning Brightness, 
0 log I 





Correlation Level of 
Coefficient Significance 


Average arrival time most 
Men (7:37-8:22 a.m.) 
Average arrival time most 
Men after correction 
for weekly cycle 
Average arrival time most 
Women (7:37-8:22 a.m.) 
Average arrival time most 
Women after correction 
for weekly cycle 
Average arrival time very early 
Men (before 7:37) 


+.22 7% 


+.16 


barometer is low. The correlation coefficient 
linking change in barometric pressure and 
punctuality was only .01 and other factors 
had similarly low values. 

Numerous studies have shown that tem- 
perature and humidity factors are of physio- 
logical importance (4). Nevertheless, over the 
period studied, the environment of the sub- 
jects was almost entirely controlled by arti- 
ficial means, home furnaces, car heaters, etc., 
rather than by the prevailing meteorological 
conditions. No important correlation of such 
factors as temperature, humidity, wind di- 
rection or wind velocity with employee at- 
tendance was evident. At this time of the 
year people are largely protected from di- 
rect weather influence by house walls and 
enclosed cars. These enclosures probably 
reduce all psychological and physiological 
weather manifestations. However, atmos- 
pheric pressure and light intensity would seem 
to penetrate such barriers to a greater extent 
than wind, cold, or humidity. 


Discussion and Summary 


The data studied here covering 8000 arrival 
times over a three-month period indicates that 
as a group, employees arrive at work in a 
pattern apparently inversely dependent on 
the brightness of the morning light. Such a 
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trend appears equally true for both sexes, 
with the exception of about 6%, all males, 
who come to work far ahead of the official 
starting time. Since their early arrival in it- 
self sets these 6% as rather an individualistic 
group, it is not surprising to find that their 
reaction to both the weekly “fatigue” cycle 
and the light intensity is the converse of the 
rest of the workers. 

In general, these reactions by employees 
would seem to reflect on their attitude about 
their jobs rather than serve as any reliable 
indication of feeling tone or satisfyingness (3). 
Thus it is easy to imagine that when it was 
sunny and beautiful outside the chore of earn- 
ing a livelihood was put off. Perhaps a few 
fathers played a little longer with their chil- 
dren or paused to sniff a crocus on their way 
to the car. On the other hand on a dark, 
dismal morning, more often than not, the 
tired secretary and sleepy engineer drank 
their coffee more quickly and set off to work 
promptly and without fanfare. 

Those men who arrive very early would 
seem to regard their work differently than the 
majority. One can only conjecture that they 
eagerly hurry to the job in the early morning 
sunshine, anxious to start another day of 
activity while their compatriots dawdle an 
extra two minutes admiring the blue skies. 
A world which produces both Stoics and 
Epicureans should. not find such diverse be- 
havior surprising. 


Received November 13, 1952. 


References 


1. Allport, F. H. The J-curve hypothesis of con- 
forming behavior. In Readings in social psy- 
chology. New York: Henry Holt, 1947. 

. Bartley, S. H. Studying vision. From Methods 
of psychology. New York: Wiley and Sons, 
1948. 

. Bills, A. G. Studying motor functions and eff 
ciency. From Methods of psychology. New 
York: Wiley and Sons, 1948. 

. Hirsh, J. Comfort and disease in relation io cli- 
mate. Climate and man. Washington, D. C.: 
U. S. Government Printing Office, 1941. 

. Mills, C. A. Medical climatology. Springfield, 
Ill.: Charles C Thomas, 1939. 

. Winslow, C.-E. A. and Herrington, L. P. Sub- 
jective reactions of human beings to certain 
outdoor atmospheric conditions. Heat., Pip- 
ing & Air Conditioning, 1935, 7, 551-556. 





Tue Journat or Apptirp PsycHoLocy 
Vol. 37, No. 5, 1953 


Prediction of Turnover Among Clerical Workers 


Philip H. Kriedt and Marguerite S. Gadel 


The Prudential Insurance Company, Newark, N. J. 


Companies like The Prudential Insurance 
Company which hire a large number of High 
School girl graduates to do routine clerical 
work frequently have a turnover problem. 
We find that among the High School girls we 
hire each year some become permanent em- 
ployees and make a career of their jobs. A 
larger number work for a few years and then 
quit to become housewives and raise a family. 
Both these groups we feel are good invest- 
ments. There is a third group of new em- 
ployees which concerns us, however. They 
are the girls who leave in a year or less to 
take other jobs or to go to college. These we 
consider to be a turnover problem. 

We have done several investigations to see 
if we can reduce our turnover rate by de- 
termining at the time of employment whether 
or not a girl is a good turnover risk. Some 
of our most recent findings from this research 
are summarized in this article. 


Predictor Measures 


All High School girls hired in June, 1951, 
were given an experimental battery of tests 
and questionnaires selected as possible pre- 
dictors of turnover. The battery included a 
measure of intelligence, a measure of clerical 
aptitude, an interest questionnaire, a_bio- 
graphical data blank, and a job preference 
questionnaire. 


1. General ability or intelligence was measured 
by two tests: Vocabulary and Arithmetic Reason- 
ing. Scores for these two tests were combined 
for purposes of predicting turnover. 

2. Clerical aptitude was measured by four tests: 
Name Checking, Number Checking, Dotting, and 
Letter-Digit Substitution. These four scores were 
combined to give a single clerical speed test score. 

3. Interest scores were obtained from a ques- 
tionnaire developed by the Company consisting 
of 285 items similar in form and content to those 
used by Strong. A key to predict turnover, con- 
sisting of 15 items scored by unit weights, was 
developed from data obtained in a previous study. 
A longer key consisting of 43 items did not cross- 
validate as well as the shorter key. The key 
identifies poor turnover risks as girls who like 
artistic, literary, scientific, selling, and social serv- 


ice activities, and who dislike manual, mechanical 
and clerical activities. 

4. Biographical information was obtained in a 
blank including both factual and attitudinal ques- 
tions related to educational and family back- 
ground. Fourteen multiple choice questions were 
given unit scoring weights. Some examples of 
questions are these: 

Would you like to go to college if you could 
afford it? (Check one) a. Yes; b. 
No; c. Not sure. 

Which of the following best describes 
High School course of study? (Check one) 
a. College preparatory or academic; b. Com- 
mercial and secretarial; c. General; —— d. 
Other 

Which of the following occupational groups 
best describes your father’s work during most of 
his life? (Check one) — a. professional; 

b. managerial or executive; c. own 
business ; d. clerk in a store; e. clerk 
in an office; f. salesman; —— g. skilled 
trade; h. farmer or rancher; i. semi- 
skilled (factory worker, miner, etc.); j. 
Other : k. Don’t know. 

5. The last predictor was a job preference ques- 
tionnaire which is a modification of the form de- 
veloped by Jurgensen (3) at the Minneapolis Gas 
Company. This form requires the respondent to 
rank 11 factors (Advancement, Benefits. Com- 
mutation, Company, Co-workers, Hours, Pay, 
Security, Supervision, Type of Work,°and Work- 
ing Conditions) in terms of their relative impor- 
tance to her; and also to rate the importance of 
having a job which is interesting, important, not 
strenuous, free from work pressure, uses one’s 
abilities, has much responsibility, and allows free- 
dom for planning one’s own work. 


your 








Procedure and Results 


This battery was administered to 358 em- 
ployees in June, 1951. Sixty-five of them left 
in three months or less and 43 more left from 
four to twelve months after being hired. 
Point biserial correlations were computed for 
each of the five predictor variables for three- 
month turnover as well as twelve-month turn- 
over. The validity coefficients for three- 
month and twelve-month turnover are given 
in Table 1. 

Table 1 shows that the General Ability 
Tests have a validity of — .25 for three-month 
turnover and — .21 for twelve-month turn- 
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Prediction of Turnover Among Clerical Workers 


Table 1 


Point Bi-serial Correlations Between Various Predictors and Turnover for Clerical Employees* 


Predictor 
General Ability Tests 
Clerical Speed Tests 
Interest Questionnaire 
Biographical Data 
Job Preference Blank 


over. Negative validity means that girls who 
left had higher scores than those who stayed. 
In a previous study of eighteen-month turn- 
over for 1600 girls a validity of — .17 was 
obtained for these two tests. Clerical speed 
tests have practically zero validity. In two 
previous studies these tests had slightly higher 
validity. The interest turnover key has va- 
lidity of .19 for both groups. This is a cross- 
validation result as the key was developed in 
another study. Biographical Data yields va- 
lidities of .37 and .29. Girls who leave, as 
compared with those who stay, more fre- 
quently say they took college preparatory 
courses, have fathers in professional and 
managerial jobs, and would like to go to 
college if they could afford it. Although these 
identical items have not been used before, 
similar questions have been used with similar 
results. The validities of the Job Preference 
Questionnaire, .33 and .21, have not been 
cross-validated. The key for this measure was 
developed empirically on this sample. Girls 
who leave, as compared with girls who stay, 
placed more importance on type of work, pay, 
and oa having a job which used their abilities 
and gave them freedom to plan their own 
work. Those who left placed less importance 
than those who stayed on working for a com- 
pany they are proud of, on company benefits 
and on being free from work pressure and 
strenuous physical requirements. Since these 
results have not been cross-validated we did 
not compute intercorrelations between the Job 
Preference Questionnaire and other predictors 
and we did not use Job Preference scores in 
our multiple correlation solution. We will be 
interested in future cross-validation of the 
results obtained with this questionnaire. 
Intercorrelations among the four variables 
used in doing multiple correlations are given 


12 Month Turnover 
(Leaving N = 108) 
(Staying N= 250) 

~ 25 a 

03 05 
19 

37 .29 
ao 21 


3 Month Turnover 
(Leaving N= 65) 
(Staying N= 293) 


* Negative correlations in this table indicate that those who left scored higher than those who stayed 


in Table 2. In Table 3, you will see that a 
multiple R of .40 was obtained for three- 
month turnover and .33 for twelve-month 
turnover. In both prediction equations, Bio- 
graphical Data has much more weight than 
the other predictors. 


Table 2 


Intercorrelations of Predictor Variables 
(N = 358) 


Interest Bio 
Question- graphical 
naire Data 
General Ability Tests .20 —.32 — 41 
-.04 ~.10 
Interest Questionnaire 39 


Clerical 
Speed 


Tests 


Clerical Speed Tests 


In order to determine the practical useful- 
ness of the three-month turnover equation, 
we examined the data to see what would have 
happened if, at the time of employment, we 
had rejected the 35 girls out of the total 
group of 358 who had the lowest scores on 


Table 3 


Multiple Correlation Data 


Multiple 
Point 
Biserial 
Turnover Group R Beta Weights* 


3 Month Turnover 40 Biographical Data 1 


Clerical Speed Tests 


) 
General Ability Tests —5 
3 
1 


Interest Questionnaire 
12Month Turnover .33 Biographical Data 8 
General Ability Tests —4 
Clerical Speed Tests 3 
Interest Questionnaire 2 


* High positive indicates that individual is 


likely to stay. 


score 
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Table 4 


Effectiveness of Three-Month Turnover Battery: 
Actual Behavior of the 35 Girls with Lowest 
Scores as Compared with Re- 
mainder of Sample 





Stayed 
Less than 3 Months 
3 Months or More 


Accepted 42 281 


Left in 
Total 
323 





Rejected 23 12 35 


Total 65 293 358 


the three-month turnover battery. Under 
present labor market conditions we would not 
want to reject many more than this. We 
found that if we had rejected these 35 girls 
we would have rejected 23 girls who would 
leave in three months or less and only 12 girls 
who would stay longer than three months. 
This means that we would have rejected 36% 
of the total group who would leave in three 
months and we would have rejected only 4% 
of those who would stay longer than three 
months. Thus it appears that we can use our 
turnover equations to screen out a substantial 
percentage of girls who would quit the Com- 
pany very quickly and would not justify their 
training expense, and at the same time only 
lose a small percentage of the girls who be- 
come useful long time employees. 


Summary 


As a summary of the findings and implica- 
tions of this study we would like to make the 
following points. 

1. We can predict quick turnover among 
newly hired girls for routine clerical jobs 
moderately well using a combination of Bio- 
graphical Data, an Interest Questionnaire, 
General Ability Tests, and Clerical Speed 
Tests. Biographical Data is the best pre- 
dictor. The other measures increase only 
slightly the effectiveness of prediction as esti- 
mated by multiple correlation. 

2. We can predict turnover for girls who 
leave in less than three months better than 
for girls who leave in less than twelve months. 
As you might infer from this, we cannot pre- 
dict four to twelve month turnover nearly as 
well as one to three month turnover. One 
possible explanation of this is that girls who 


leave very quickly are more definitely un- 


‘suited for their jobs than those who leave 


later and therefore their turnover is more 
predictable. Another possible explanation 
is that a very high proportion of the three- 
month turnover group go on to college and 
this kind of turnover may be especially pre- 
dictable. 

3. The use of General Ability tests with 
negative weights in selecting girls who will be 
good turnover risks does not conflict with our 
aptitude batteries used to predict job per- 
formance on beginning assignments, since the 
valid predictors for most of those jobs are 
tests of clerical ability rather than the Arith- 
metic Reasoning and Vocabulary tests. 

4. Textbooks in industrial psychology (1, 
p. 248, 2, pp. 313-314, 4, pp. 89, 97) fre- 
quently stress the negative relationship be- 
tween intelligence and the likelihood of a 
person staying on a routine clerical job, and 
recommend the use of upper critical scores 
on intelligence tests for selecting personnel 
for such jobs. While we did find the same 
negative relationship, it is interesting to note 
that in this study other factors such as family 
and educational background and interests and 
aspirations tend to be more important than 
intelligence. 

For our purposes we do not think it neces- 
sary or desirable to use an upper critical score 
on intelligence. General ability scores are 
related to success on most of our higher level 
jobs, and in order to have girls with poten- 
tiality for advancement it is necessary to hire 
a number with high general ability. For- 
tunately our research indicates we can hire 
girls with such ability who will be fairly good 
turnover risks as well as good performers on 
beginning jobs if we screen them carefully on 
biographical and interest measures, and cleri- 
cal aptitude tests. 
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The common way of presenting evidence 
concerning the validity of a selective device is 
by means of the validity coefficient. When 
this coefficient is high then the device is said 
to be useful as a means for evaluating candi- 
dates for a job, and when it is low the device 
is said to be ineffectual. Taylor and Russell 
(2) have shown that this notion is too simple 
to adequately describe the situation, since 
gains resulting from the use of a selective de- 
vice also will be a function of the proportions 
of persons selected and rejected. When the 
validity coefficient is low and the ratio of the 
number of persons selected to the number of 
applicants is low, the gains may be greater 
than when the validity is high and the selec- 
tion ratio also high. 

Taylor and Russell’s approach has been to 
evaluate the effectiveness of the results of 


selection in terms of the proportion of selected 
persons who turn out to be successful on the 
job. That is, some cut-off point is set on the 
criterion and all individuals who meet or ex- 
ceed this critical point are deemed successful, 
while those falling below are termed unsuc- 


cessful. In many situations this approach is 
exceedingly useful. Thus, in a training pro- 
gram where a specified proportion of persons 
are to be passed, knowing the validity of a 
test and the proportion of persons who will 
be selected, an estimate can be made of the 
proportion of those selected who will pass the 
course. Gains through use of the test can 
then be expressed in terms of the increase in 
the proportion of persons passing the train- 
ing program. 

In other situations, however, this is not the 
information desired. Rather, what is wanted 
is some estimate of the proficiency of those 
selected as they are measured by some con- 
tinuous scale. Thus the question might be 
asked, if a test of known validity is used and 
a given proportion of candidates is selected on 
the basis of their scores, how will the output 


of the selected workers compare with that of 
the unselected workers.’ If the average pro- 
duction of selected workers is not much 
greater than that of unselected workers, then 
the test will not be worthwhile even though it 
possesses high validity. Furthermore, having 
an estimate of the potential proficiency of the 
selected workers will make it possible to im- 
prove the planning of production schedules. 
Suppose, for example, it were desired to place 
on a particular job persons whose average pro- 
duction is a given amount. Knowing the va- 
lidity of the test, the proportion to be selected 
to achieve a certain production schedule could 
be determined. 

Jarrett has recently considered this prob- 
lem and has developed a formulation which 
permits the appropriate estimates to be made 
(1). As with the Taylor-Russell approach 
normal linear correlations are assumed. The 
data necessary to estimate gains in proficiency 
from use of a selective device are the validity 
coefficient, the proportion of cases to be se- 
lected, and if per cent gains are to be esti- 
mated, the mean and standard deviation of 
the criterion scores of unselected cases. 

Table 1 is the basic table that has been de- 
veloped from Jarrett’s formulation. This 
table gives the mean of the standard criterion 
scores of the selected cases in relation to the 
validity and the selection ratio. The basic 
distribution of standard scores is of the un- 
selected cases, and has a mean of zero and a 
standard deviation of unity. For example, 
suppose the validity of the selective device is 
.50 and the 25% highest scoring candidates 
are selected, then the mean criterion score of 

1As used in this paper the term “unselected” will 
have the same meaning as given in Taylor and Rus- 
sell’s (2) and Jarrett’s (1) discussions. It will refer 
to “the members of that population of individuals 
who apply for the job in question and who-—-when 
individuals are needed for the job in question— 
would have been put to work without further regard 


for their qualifications before the testing program 
was initiated.” 
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Table 1 


Mean Standard Criterion Score of Selected Cases in Relation to Validity and the Selection Ratio 
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the selected cases would be .63 standard devia- 
tions above the mean criterion score of the 
unselected cases. By reversing signs, the 
mean criterion score of rejected cases can also 
be estimated. In the case just given, the 
mean criterion score of the rejected 75% of 
cases would be .21 standard deviations below 
the mean of unselected cases. 

It is apparent from Table 1 that the smaller 
the selection ratio is the greater will be the 
mean criterion performance of the selected 
cases. Reduction in the selection ratio re- 
sults in an increase in mean criterion scores, 
the relationship being positively accelerated, 
the greatest increase in rate of gain occurring 
with selection rates smaller than about 20% 
to 30%. Similarly, as validity increases there 
is an increase in the mean criterion score of 
the selected cases. In this instance, however, 
it will be noted that gains are directly propor- 
tional to increase in validity. 

In many cases the interest will not be in 
the standard criterion scores of the selected 
group but rather in raw criterion scores. 
Knowing the mean standard score of the se- 
lected group, and the mean and standard de- 
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viation of the unselected group, the desired 
transformation, of course, can easily be made. 
Thus in the case already given where the 
mean standard score of the selected cases was 
.63, if the mean and standard deviation of the 
raw criterion scores of the unselected cases 
were 50 and 10 respectively, the mean raw 
criterion score of the selected cases would be 
56.3. The per cent improvement in profi- 
ciency through selection, therefore, would be 
12.6. 

The appropriate calculations have been per- 
formed for various values of the ratio o/M 
and are presented graphically in Figure 1. 
An example will illustrate how this chart is 
read. Suppose we have a test with a validity 
of .50 and we are planning to select 20% of 
persons earning highest scores, the ratio of 
the standard deviation to the mean of the 
raw criterion scores of the unselected group 
being .2. Locating the value of the per cent 
selected, that is 20%, at the bottom and left 
of the chart, the line is followed up until it 
intersects with the curve representing a va- 
lidity of .50. Now we follow the line across 


‘to the right until it intersects with the vertical 





Per Cent Increase in Proficiency from Use of Selective Devices 


line representing the o/M ratio of .2. Per 
cent improvement is determined from the 
placement of this point in the series of curves; 
in the present case this value would be inter- 
polated as approximately 14%. The mean 
standard score of the selected group, as also 
read from the center column of the chart 
(standard criterion scores of selected cases), 
is .7. 

The chart, of course, can also be read in the 
reverse direction. Suppose the ratio of the 
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standard deviation to the mean of the raw 
criterion scores of the unselected cases is .25 
and it is desired to improve criterion perform- 
ance by 20% through selection of personnel. 
Locating the point at the intersection of the 
vertical line for a o/M of .25 and the curve 
for 20% improvement, following a line hori- 
zontally will give various values of validity 
and of selection ratio that will produce the 
desired result. If as many as 50% of appli- 
cants are to be selected, then test validity 
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would have to be perfect (1.00). With a 
more reasonable validity of .40, only the best 
5% could be selected. 

It is apparent from Figure 1 that as the un- 
selected workers become more homogeneous 
in their criterion performance, that is, as the 
value of o/M decreases, the smaller will be 
the gain from the selection device. For ex- 
ample, with a validity of .50 and a selection 
ratio of 10%, if «/M is .3 then improvement 
will be of the order of 27%. However, if the 
.t/M is .05 then the per cent improvement will 
only be about 5%. Probably the limiting 
case for heterogeneity of criterion perform- 
ance can be taken as a o/M of .33, the stand- 
ard deviation being one third the magnitude 
of the mean. Since sometimes heterogeneity 
of criterion performance is expressed in terms 
of the ratio of the output of the best to that 
of the poorest worker, an appropriate scale for 
such values is given on the chart at the foot 
of the right half of the figure. Since values 
here must be chosen arbitrarily, performance 
of the best worker is taken as being + 2.5 
standard deviations in the distribution of cri- 
terion scores and the poorest as — 2.5 stand- 
ard deviations. 
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The picture of the value of selection as 
given by this approach is by no means too 
favorable. A validity of .50 is about as high 
as can be expected in most instances and 
seldom can a selection ratio be less than 10%. 
A generous value of «/M would be .25 (ratio 
of best to poorest worker being 4 to 1). For 
these values it will be seen from Figure 1 that 
the expected improvement in criterion per- 
formance is only 23%. In most cases va- 
lidity will be somewhat lower, the selection 
ratio higher, and criterion performance more 
homogeneous. Under optimal conditions, 


therefore, improvement in productivity as a 
result of a selection program can be consid- 
ered to approximate 25%. 
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In most personnel selection situations where 
tests are used other sources of information 
about applicants are considered as well in de- 
ciding whether to accept or reject peuple. 
Most writers of textbooks in employment 
psychology recommend the use of tests in 
just this way—as supplements to other valid 
personal data. Used in this way any test, no 
matter what its validity, may vary consid- 
erably as to the role it will play in a given 
company’s program, or among different com- 
panies. 

How much weight should a personnel of- 
ficer place upon the test scores of two people 
when other data about them are also avail- 
able? Even low validity coefficients become 
actuarily significant in the course of many de- 
cisions based upon test scores.*. The Taylor- 


The author is very grateful to Professor James 
N. Mosel of George Washington University, discus- 
sions with whom suggested the main concepts of this 
paper. It is also a pleasure to thank Dr. John R. 
Boulger of this office for his helpful review of the 
manuscript. 

2 Tiffin, J. Industrial psychology. New York: 
Prentice-Hall, 1952. 


Russell tables * specify the efficiency of selec- 
tion using tests whose validity coefficients may 
vary between O and 1, depending upon vari- 
ous existing employment conditions. How- 
ever, these tables assume that all decisions 
will be made over the long term on the basis 
of test scores alone. 

What is needed is a guide which will help a 
personnel officer decide about the risk he may 
be assuming in relying entirely upon the 
achievement in a test by two or more appli- 
cants for a position—or in choosing to ignore 
relative scores in favor of non-test considera- 
tions. It is possible to specify the probability 
that a test has correctly ranked two people in 
terms of a criterion of job performance when 
the difference in their standard scores on the 
test is known. If the assumptions for com- 
puting the product-moment coefficient of cor- 
relation had been properly met in computing 
the validity coefficient, and all scores are now 

® Taylor, H. C. and Russell, J. T. The relation- 
ship of validity coefficients to the practical effec- 


tiveness of tests in selection: discussion and tables. 
J. appl. Psychol., 1939, 23, “65-578. 


Table 1 
The Probability of Selecting the Better of Two Workers on the Basis of the 
Difference in Their Test Scores 
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Two Test Scores in - - 
Standard Score Units F ; 3 
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346 Laurence 
expressed as standard scores, then we can 
derive Table 1.4 This gives the probability 
that for any two subjects selected at random 
from among those taking the test, the one of 
them who has earned the higher score in the 
test will be the better worker—in terms of the 
criterion of validity for that test. 

As an example of how Table 1 would be 
applied in a practical situation consider the 
following data: 


A achieves a standard score of .95 in a 
test; B earns a score of .20; the validity 
coefficient of the test is .7; what is the 
probability that A will prove to be a better 
worker than B? 


4See: Jenkins, W. L. An index of selective effi- 
ciency (S) for evaluating a selection plan. J. appl. 
Psychol., 1953, 37, 78, for a comparable treatment 
which disregards test score difference. 
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Since the difference between the scores of 
the two men is .75 standard score units, the 
first column of the table is entered at .75, and 
moving over to the column for r= .7 the 
tabled probability is given as .70. This means 
that on the basis of test score alone there are 
about 7 chances in 10 that A will turn out 
to be the better worker. Or, viewed con- 
versely, if the personnel officer should decide 
to disregard their relative achievement on the 
test and select B over A for the job, there 
would be only about 3 chances in 10 that his 
decision will prove to be correct. 

Table 1 is an effective way to illustrate the 
meaning of a validity coefficient to personnel 
people in terms of their own operations. 
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The primary purpose of this study was to 
compare ratings made by supervisory person- 
nel and by co-workers on candidates for pro- 
motion to leadman jobs. Specifically, answers 
to the following questions were sought: 


(1) To what extent do supervisory per- 
sonnel and co-workers agree in their ratings 
of workers? 

(2) How does the extent of this agreement 
compare with (a) the extent to which mem- 
bers of supervision agree with each other, and 
(b) the extent to which co-workers agree with 
each other in the ratings given workers? 


In analyzing the data to answer these ques- 
tions, answers were suggested for other ques- 
tions, such as: 


(3) How do judgments on different items 
in the rating form compare with each other? 

(4) Is there any evidence that supervisors 
tend to rate candidates lower or higher than 
do their co-workers? 

(5) How do the totals of the ratings on the 
individual characteristics compare with the 
ratings on the suitability of the candidate for 
promotion? 


The problem is of practical importance in 
determining the reliability of ratings by super- 
visors and co-workers and in arriving at the 
appropriate weights to be given ratings made 
by them in an over-all evaluation of candi- 
dates for promotion. The provision in many 
union contracts which states that promotions 
to jobs covered by the contract are to be 
governed by seniority only when ability, skill 
and job performance are equal draws atten- 
tion to the need for devising techniques for 
determining workers’ suitability for promo- 
tion. These techniques must be acceptable 
to the union and management and, at the same 
time, be statistically sound. It then becomes 
important to analyze the results of these tech- 
niques in their actual application. The pres- 


ent study provides data on two of these 
techniques, namely, co-worker ratings and 
supervisory ratings. 

From a theoretical standpoint, the study 
contributes some data on the attitudes of 
two distinct groups in the economic structure 
and on the relative homogeneity of thought 
of these two groups with respect to one aspect 
of their work environment. An accumulation 
of such data will enable us at some future 
time to arrive at a psychological and socio- 
logical understanding of the two groups which 
will be invaluable to the industrial psy- 
chologist. 


Ratings Studied 


This study is based on the ratings made on 
100 men who were candidates for leadman ' 
jobs in 14 different departments of the manu- 
facturing division of a major aircraft com- 
pany. The ratings were made as a regular 
phase of the company’s supervisory selection 
program in which each candidate is evalu- 
ated on the basis of his work experience, edu- 
cation, work record, and scores on mental 
ability, shop math, and job knowledge tests, 
in addition to the ratings. Ratings are made 
by two supervisors, representing two levels of 
supervision over the candidate, and by three 
co-workers who work closely with the can- 
didate but who are not eligible to be candi- 
dates for the leadman job. The ratings 
analyzed here are the ratings made by two 
members of supervision and two of three co- 
workers (selected at random) for each of the 
100 candidates. A total of 68 different as- 
sistant foremen and foremen made the super- 
visory ratings. The exact number of co- 
workers participating cannot be reported since 
these rating forms were not signed, but the 
number was probably between 150 and 175. 

1 At North American Aviation, Inc., a leadman di- 


rects a group of five to ten men. The job is covered 
by union contract. 
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A worker ordinarily rated only one worker 
for any one job opening and rarely did a job 
opening occur in the same group during the 
period studied. 

The two rating forms used were the “be- 
havior sample” type in which five gradations 
from very poor to outstanding were described 
for each characteristic. The form used by 
the co-workers consisted of five factors; 
namely, job knowledge, job performance, co- 
operation, ability to train others, and suita- 
bility for promotion to leadman. The form 
used by supervisory personnel consisted of 
eight factors; namely, job knowledge, quality 
of work done, quantity of work done, co- 
operation, drive, observing rules, personal 
appearance and manner, and suitability for 
promotion to leadman. The raters were in- 
structed to check the one statement for each 
factor which best described the candidate. 
The ratings were made independently. For 
purposes of this report, the five intervals have 
been assigned values of 1 through 5, from 
lowest to highest. 


Statistical Method 


The degree of relationship between the 
variables studied has been measured by the 
product moment coefficient of correlation. 
It was believed that the nature of the data 
justified the use of this technique because 
the series were more nearly continuous than 
discrete and more nearly quantitative than 
qualitative. 

When a difference is described in the report 
as significant, that difference is so large that 
it could be expected by chance not more than 
once in 100 times (P = 0.01). 


Results 


Relationship between ratings made by mem- 
bers of supervision and co-workers. The co- 
efficients of correlation between the ratings 
made by one member of supervision and one 
co-worker for each candidate on items com- 
mon to both rating scales are shown in Table 
1. The single supervisory rating was chosen 
at random from the two ratings made and the 
single co-worker rating was chosen at random 
from the three ratings made. Supervisory rat- 
ings on quality of job performance and quan- 
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tity of work done have been compared with 
the single rating on job performance given by 
co-workers. 

All of the correlations are rather low, rang- 
ing from .15 to .39; however, only the lowest 
coefficient is not significantly greater than 
zero. There is greatest agreement on the 
over-all rating of general fitness for promo- 
tion. The data in Table 1 suggest that co- 


Table 1 


Relationship Between Ratings Made by Members 
of Supervision and Co-workers 


Coefficient of 
Correlation 


Item Rated 





Job knowledge 85 
Job performance—Quality 25 
Cooperation .29 
Job performance—Quantity 33 
General fitness for promotion 39 


worker and supervisory ratings do not dupli- 
cate each other unnecessarily; and, at least 
in this respect, the consideration of both types 
of ratings in evaluating candidates for promo- 
tion seems justified. 

The low degree of agreement between the 
ratings of supervisory personnel and co-work- 
ers indicates that many factors determining 
the ratings of the two groups are either not 
similar, or are not receiving the same rela- 
tive emphasis. Perhaps their standards of 
judgment, based on differences in scope and 
type of experience and present job status, ac- 
count for the lack of agreement. Their rat- 
ings may be determined by observations of 
different samples of behavior of the men being 
rated. On the other hand, the discrepancies 
in the ratings found here may be accounted 
for, in part, by differences of opinion on what 
characteristics are desired in a leader of the 
work group. Research on worker and super- 
visor attitudes with regard to how work 
groups should be led has suggested that such 
differences exist (1). 

The data reported here merely show that 
differences between the opinions of co-workers 
and members of supervision do exist; further 
research is necessary to identify the sources 
of these differences. 
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Relationship between ratings made by pairs 
of co-workers. The coefficients of correlation 
between ratings made by pairs of co-workers 
on the candidates are shown in Table 2. The 
coefficients indicate an agreement between 
pairs of co-workers which, although greater 
than zero, is moderately low to moderate. 


Table 2 


Relationship Between Ratings Made by 
Pairs of Co-workers 





Coefficient of 
Correlation 


Item Rated 


Cooperation 
Generai fitness for promotion 
Instruction ability 
Job performance 
Job knowledge 
Total of all items 


With one exception, the correlations be- 
tween ratings made by pairs of co-workers 
are higher than the correlations between co- 
workers and supervisory personnel. There 
is slightly less agreement among co-workers 
than between co-workers and supervisors on 
general fitness for promotion; however, the 
difference is not significant. 

When ratings given for all items on the 
rating form are combined, the coefficient ob- 
tained is slightly higher (though not sig- 
nificantly so) than for any individual item. 
The coefficients in Table 2 are in line with 
those reported in most studies of supervisory 
merit ratings (2, 3, 4). 

The greater agreement among co-workers 
than between co-workers and supervisors may 
reflect more similarity among the former than 
between the latter with respect to standards 
of judgment, behavior actually observed, and/ 
or opinions on what characteristics are de- 
sired in a leader of the work group. 

The fact that only moderate agreement is 
found indicates that the co-workers are far 
from being a homogeneous group with respect 
to attitudes toward their co-workers. 

The comparison here may be interpreted 
as a measure of the reliability of the co- 
worker ratings. The moderately low to 
moderate reliability of the ratings indicates 
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that such ratings should not be used as the 
sole basis for selection and that care must be 
taken in their interpretation. The relatively 
low reliability of co-worker ratings, as com- 
pared with reliability coefficients of other 
types of measures, should be considered in 
deciding on the weight of these ratings in the 
battery of measurements to be used in evalu- 
ating the candidates. 

Relationship between ratings made by pairs 
of supervisory personnel. The coefficients of 
correlation between the ratings made on each 
candidate by two members of supervision are 
shown in Table 3. 

The coefficients, ranging from .56 to .71, 
indicate a fairly high degree of agreement be- 
tween the members of supervision in rating 
workers on all items included in the rating 
scale. The over-all rating, general fitness for 
promotion, showed the highest degree of 
agreement although none of the differences 
between the items are clearly significant. 
The fairly high correlations indicate that 
members of supervision tend to base their rat- 
ings on similar observations of the workers’ 
performance and to judge the various char- 
acteristics according to similar standards. 

All of the coefficients reported in Table 3 
exceed those reported in the previous com- 
parisons and suggest a greater degree of 
agreement among members of supervision 
than among co-workers and between co-work- 
ers and members of supervision. 

If the relationship is interpreted as a meas- 
ure of reliability, then the supervisory ratings 


Table 3 


Relationship Between Ratings Made by 
Pairs of Supervisors 


Coefficient of 


Item Rated Correlation 





Observing rules 56 
Personal appearance 61 
Quality of work 61 
Job knowledge 63 
Drive 65 
Quantity of work .66 
Cooperation 67 
General fitness for promotion 71 
Total of all items 
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Table 4 





Distributions of Ratings by Supervisors and Co-workers 








Number of Ratings in Interval 





— ——_—_—— Mean of Standard 
Item Rated and Rater 1 2 3 4 5 Ratings Deviation 
Job knowledge 
Supervisors 1 8 73 80 38 3.73 83 
Co-workers 1 8 55 63 73 4.00 92 
Job performance—Quantity* 
Supervisors 0 5 84 68 43 3.74 82 
Co-workers 1 6 44 76 73 4.07 86 
Job performance—Quality 
Supervisors 0 1 56 86 57 4.00 76 
Co-workers 1 6 76 73 4.07 86 
Cooperation 
Supervisors 0 8 90 44 58 3.76 92 
Co-workers 1 10 41 67 81 4.08 94 
General fitness for promotion 
Supervisors 3 35 61 60 41 3.50 1.05 
Co-workers 3 16 50 65 66 3.87 1.01 


* Supervisory ratings on quality of work done and quantity of work done are compared with co-worker ratings 
on job performance which included both quality and quantity. 


have a fairly high degree of reliability. The 
greater consistency of the supervisory ratings 
as compared with co-worker ratings suggests 
that the former are more dependable. 

Comparison of the distributions of ratings 
by members of supervision and co-workers. 
The distributions of the ratings by the 200 
members of supervision and the 200 co-work- 
ers on the items common to both rating forms 
are shown in Table 4. 

The ratings of supervisors tend to be more 
conservative than those of the co-workers. 
This is evident in a comparison of the propor- 
tions of ratings of the two groups which are 
in the highest interval in the rating scale (step 
5). For every characteristic rated a smaller 
proportion of the supervisory ratings is in the 
top interval than is true of co-worker ratings. 
In only one instance is the difference small 
enough to be attributed to chance (for job 
performance-quality, P = .10). 

The tendency of supervisors to give lower 
ratings than co-workers is shown also in a 
comparison of the means of the various items 
rated. In every instance the mean of the 
supervisory ratings is lower than the mean of 
the co-worker ratings. The differences are 
significant at the 1% level, or better, with the 
exception of job performance-quality, where 
P= .19. 


Very few of the workers were rated in the 
lowest category by either supervisors or co- 
workers. Since there had been some prior 
selection of the men (they had been proposed 
for consideration by either members of super- 
vision or of Industrial Relations), it was ex- 
pected that seldom would a candidate be rated 
as very unsatisfactory in any factor. Al- 
though the frequencies in the second interval 
are higher than in the lowest interval, the 
second interval is used in fewer than 5% of 
the ratings except for the over-all rating. 
For the item, general fitness for promotion, 
approximately 18% of the ratings of super- 
visory personnel and about 8% of the ratings 
of co-workers are in the next to the lowest 
interval. 

The interval with the highest total fre- 
quency is the third, or average, interval for 
members of supervision and the fifth, or top, 
interval for co-workers. In the case of super- 
visors, for three of the five factors shown, the 
modal interval is step three, and for the other 
two factors is the fourth interval. For co- 
workers, step five is the modal interval for 
three of the four different factors shown. 
Thus, another method of analyzing the data 
shows that members of supervision tend to 
give lower ratings than do co-workers. 

Several explanations might be suggested 
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for the relatively low ratings given by mem- 
bers of supervision as compared with co-work- 
ers. Perhaps the status of supervisory per- 
sonnel results in more realistic, less personal 
ratings. Also, members of supervision have 
had more training in the use of the rating 
form since many of them attend meetings of 
the Supervisory Selection Board. Some of 
them had reviewed the rating forms when the 
forms were being constructed. 

Comparison of the individual items on the 
rating forms. In a comparison of ratings as- 
signed to the various items shown in Table 
4, it appears that the distribution for the final 
over-all ratings on suitability for promotion 
differs from the distributions on the other 
factors of ratings by both supervisors and co- 
workers. For example, the mean of the rat- 
ings for this factor is significantly lower than 
the mean of the ratings assigned any of the 
other items. A greater proportion of the rat- 
ings on this factor are in the two lowest in- 
tervals (below average) than is true of any 
other factor; however, only in the case of the 
supervisory ratings are the differences clearly 
significant. 

The differences between the standard devia- 
tions for the ratings given by the two groups 
of raters are not statistically significant. The 
greatest variation in ratings of both groups is 
found in the ratings on general fitness for 
promotion. 

When the final item, suitability for promo- 
tion to leadman, is compared with the total of 
the ratings on all other items in the rating 
forms, the correlations obtained are .85 for 
supervisors and .85 for co-workers. The co- 
efficients approximate the ones reported in 
previous studies in which the same type of 
comparison was made (3, 4). 


Summary and Conclusions 


A group of 100 men who were candidates 
for promotion to leadman jobs in the manu- 
facturing division of an aircraft company 
were rated by members of supervision and by 
co-workers. Comparisons were made between 
ratings given each candidate by: (1) a mem- 
ber of supervision and a co-worker; (2) two 
members of supervision; and (3) two co- 
workers. The following conclusions are based 
on the results of these comparisons: 
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1. There is a low, positive degree of rela- 
tionship between the ratings given by super- 
visory personnel and co-workers. 

2. There is a slighly higher degree of agree- 
ment between the ratings of pairs of co- 
workers than between the ratings of mem- 
bers of supervision and co-workers. The cor- 
relations obtained indicate a moderately low 
to moderate statistical reliability for the co- 
worker ratings. 

3. There is a much higher degree of agree- 
ment among the ratings given by members of 
supervision than among ratings given by co- 
workers. The correlations obtained indicate 
a fairly high statistical reliability for the su- 
pervisory ratings. 

4. Supervisory personnel tend to rate the 
men lower than do co-workers on all items 
common to the two rating forms as shown 
by consistently lower mean ratings, by lower 
modal intervals, and by a larger proportion 
of candidates considered below average on 
general fitness for promotion. 

5. Both members of supervision and co- 
workers tend to be somewhat more conserva- 
tive when rating the candidates on the over- 
all item, general fitness for promotion to lead- 
man, than when rating individual charac- 
teristics. 

6. There is a very high degree of relation. 
ship between the total of ratings on all 
separate characteristics and the ratings given 
on the single item, general fitness for pro- 
motion. 
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Turnover Factors as Assessed by the Exit Interview 
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Many employees in the process of quitting 
their jobs are in a mood to express feeling and 
speak frankly. If the enterprise maintains a 
formal exit interview in which the employee is 
assured that nothing he says will be used 
“against him” in any way, the tendency to- 
ward frankness and even catharsis is strength- 
ened. The exit interviewer thus is in a 
uniquely advantageous position to observe the 
dynamics of the turnover process. Different 
approaches (1, 5, 8, 10, 11) to study of turn- 
over are desirable; undoubtedly avoidable 
turnover differs from one enterprise to an- 
other in qualitative ways because of differing 
organizational climates and the patterns or 
syndromes of reasons for quitting should in 
part be products of these climates. 


Experimental Design 


On the assailable but necessary assumption 
that exit interviewers are adequate media for 
assessing the patterns of turnover, the follow- 
ing research was executed. A brief content 
analysis report ' was constructed, requesting 
the exit interviewer to estimate how often in 
five typical interviews each of sixteen topics 
was “mentioned as a reason for leaving.” 
This report form with a cover letter was sent 
to the exit interviewer in each of 200 different, 
nationally representative companies (selected 
randomly from Poor’s). Of these, nineteen 
replied that they did no exit interviewing and 
two were returned unclaimed. Another two 
were returned with verbal explanations but 
without usable quantitative data. Forty- 
eight properly completed analyses were re- 
turned and utilized in this research. The 48 
companies are geographically representative 

' The questionnaire, copies of cover letters, a copy 
of the exit content profile, the correlation matrix, 
and the job satisfactio: data used in comparison have 
been deposited with the American Documentation In- 
stitute. Order document No. 4054 from the ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress, Washington 25, D. C., 
remitting $1.25 for microfilm (images 1 inch high 


on standard 35 mm. motion picture film) or $1.25 
for photoprint readable without optical aid. 


and report an annual exit interview case load 
of 5075. 

Instrument Reliability. A split-half relia- 
bility coefficient for the content analysis re- 
port on the 48 returns was .81 which became 
.90 when corrected by the Spearman-Brown 
formula. 


Results 


Exit Interview Content Profile. According 
to reports from these 48 companies, each topic 
is mentioned as a reason for leaving as follows 
(per five representative interviews): pay, 
1.89; transportation, 0.81; promotion, 0.73; 
working conditions, 0.69; poor health, 0.64; 
job security, 0.54; friction with co-workers, 
0.52; poor housing or excessive rents, 0.50; 
personal happiness as affected by job experi- 
ence, 0.33; ability of supervisor, 0.33; broken 
promises by supervisor, 0.25; confidence in 
management, 0.19; company interest in em- 
ployee welfare, 0.15; freedom of communica- 
tion with higher levels, 0.12; recreation, 0.04; 
method of wage payment, 0.02; other prob- 
lems, 1.15. 

Comparison of Content Profile with Job 
Satisfaction Data. Most of the above topics 
are included in a widely used job satisfaction 
survey form (7). When some typical survey 
results (9) were compared with these exit 
interview content data, it was found that pay 
was the foremost grievance in both. Working 
conditions also was a major grievance in both. 
Among other topics, however, the agreement 
was moderate or low. 

The results just quoted which emphasize 
employee concern about pay may seem to 
contradict some previous research. It is true 
that many researchers (2, 3, 4, 6, 12, and 
others) have found that when employees are 
asked what they consider “most important” 
in their jobs, they do not put pay as foremost 
in importance. Actually, such results are not 
in contradiction to the job satisfaction survey 
and exit interview results, because the “im- 
portance ranking” studies represent research 
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on an entirely different variable. “Factor 
importance in a job” is not the same thing as 
what the employee is happy or unhappy about 
ina job. The factor importance ranking stud- 
ies referred to above are cast in an abstract, 
theoretical frame of reference for the employee 
respondent. They get at his set of philo- 
sophical values. But the job satisfaction sur- 
vey and exit interview get at something dif- 
ferent: not the importance but the satisfactory 
or unsatisfactory condition of each factor in 
current work experience. Generalizing from 
all these related researches, we might suggest 
that employees in general concede that pay is 
not the most important factor in a job, but 
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they nevertheless feel that it represents a 
foremost grievance factor. This generaliza- 
tion is bolstered by a non-attitudinal turnover 
study, revealing pay as a foremost objective 
correlate of turnover (8). 

Comparison of Exit Content Profile with 
Routine Personnel Counseling Profile. A con- 
tent analysis report form substantially iden- 
tical to the one used for exit data except that 
it was focused upon routine personnel coun- 
seling was constructed and sent to 39 com- 
panies which at some time had had counseling 
programs. The eight firms which finally co- 
operated returned reports from 22 personnel 
counselors who reported serving an annual 
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total load of approximately 30,000 cases. A 
sixteen topic profile was computed on these 
22 reports. The split-half reliability was .85 
which corrected to .92. When these profile 
values were correlated with the similarly- 
derived exit interview profile values rho was 
found to be .74. Apparently the content of 
routine personnel counseling interviews is very 
similar to the content of the exit interview. 

Pattern Structure of Exit Interview Con- 
tent. Intercorrelations were computed among 
the sixteen exit interview content topic fre- 
quencies for the 48 companies. A_ simple 
linkage-type cluster analysis of the resulting 
matrix was then performed in an effort to 
isolate the most characteristic exit patterns or 
climates. Results of this analysis are shown 
schematically in Figure 1. 

Perhaps the most conspicuous single out- 
come of the analysis is the presence in four 
of the five clusters of “ability of the super- 
visor.” Even “poor health,” which correlates 
with nothing else as a reason for quitting, 
correlates .75 with (lack of) “ability of super- 
visor.” The question of whether the super- 
visor is a convenient scapegoat for the em- 
ployee in poor health or whether he bears a 
causal psychosomatic relationship to employee 
poor health is not answered in these data. 
The triad in which the supervisor also figures 
along with “transportation” grievances and 
lack of “confidence in management” suggests 
an interesting pattern in some companies. It 
appears probable that employees living far 
away from the plant have more transportation 
difficulties and therefore are tardy or absent 
more frequently than other personnel. The 
supervisor (of this pattern) categorizes men- 
tally and orients to these employees as being 
of the “less dependable, tardy, absentee type.” 
Gradually the distant-living employee per- 
ceives this apparent untrusting attitude in 
the supervisor, and he develops a reciprocal 
lack of confidence in the management. Even- 
tually, according to this plausible interpreta- 
tion, he ends up in the exit interview com- 
plaining about transportation, the ability of 
the supervisor, and lack of confidence in man- 
agement. 

Three other factors, climates, or syndromes 
appear. A general human relations pattern 
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emphasizes concern with broken promises by 
supervision, friction with co-workers, com- 
pany interest in employee welfare, freedom 
of communication, promotion, ability of su- 
pervisor, confidence in management, and job 
effect on personal happiness. A security pat- 
tern emphasizes complaints about job security, 
working conditions, confidence in manage- 
ment, ability of supervisor, interest in em- 
ployee welfare, broken promises by super- 
visor, job effect on personal happiness, and 
friction with co-workers. 

An upgrade pattern is evident in a tend- 
ency toward simultaneous complaint about 
promotion, pay, and freedom of communi- 
cation. Poor housing complaint is also in- 
cluded, correlating negatively with pay com- 
plaint but positively with grievances about 
promotion and freedom of communication. 
The “rush to get ahead” syndrome is apparent 
here. Promotion and access to “higher ups” 
are considered important—along with pay or 
satisfying family housing. They do not feel 
that the present employment permits them to 
meet their pay and prestige goals fast enough. 

In interpretation it should be noted that 
these patterns may, in fact, represent the turn- 
over-inducing climates operating in parts of 
the 48 enterprises reported on. Unfortu- 
nately, to an unknown degree, it is possible 
that the patterns may be influenced by frames 
of perceptual reference of the interviewers 
themselves. Even allowing for some common 
sets among interviewers, it still seems prob- 
able that their reports must also have been in- 
fluenced by what they have seen, heard, and 
sensed in their daily work of exit interviewing. 


Summary 


Forty-eight exit interviewers in as many 
companies supplied topical analyses of exit 
interview content. These data, ostensibly 
products of differing turnover climates, were 
summarized by topic, intercorrelated, and 
analyzed to suggest the following conclusions. 

1. Pay grievances were mentioned twice as 
frequently as any other single topic of com- 
plaint. Next in order of complaint were 
transportation, promotion, working conditions, 
poor health, job security, co-workers, housing, 
the job, supervisor, confidence in manage- 





Turnover Factors as Assessed by the Exit Interview 


ment, interest in employee welfare, freedom 
of communication with higher levels, recrea- 
tion, and method of wage payment. 

2. The relatively heavy emphasis upon pay 
and working conditions agrees with the heavy 
emphasis assigned by regular employees them- 
selves in job satisfaction surveys, and with 
turnover correlates, but disagrees with “factor 
importance ranking” studies. Otherwise, exit 
interview topic emphasis agrees only moder- 
ately with “per cent dissatisfied” on job satis- 
faction surveys of non-quitting personnel. 

3. When 22 regular (not exit) personnel 
counselors submitted reports of content of 
their routine counseling, their mean profile of 
topic frequencies was found to correlate (rho) 
.74 with the mean profile obtained on the 48 
exit interviewers. Apparently there is much 
in common among the frustrations expressed 
by employees who are quitting and by em- 
ployees still on the job. 

4. Acluster analysis of exit topic frequency 
intercorrelations was performed with the fol- 
lowing climatic patterns resulting: a human 
relations syndrome; a security syndrome; an 
upgrade syndrome; a_ transportation-confi- 
dence triad; and an unnamed duad. 
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It is customary in cases where personal 
data are used as predictors of various criteria 
of job performance to use as predictors all of 
the items which show a significant relation- 
ship with the criterion. The drawback to this 
mode of operation lies in the possibility of 
some of the included items contributing more 
error in prediction than they do to actual va- 
lidity. 

This phenomenon can best be explained in 
terms of item and criterion variance. A hy- 
pothetical illustration will be given in terms 
of two items. Item 1 shares 20 per cent of 
its variance in common with the criterion, in- 
cludes 50 per cent specific variance and 30 
per cent error variance. The second item 
shares 15 per cent of its variance with the 
criterion, but 12 of these 15 per cent are in 
common with the 20 per cent shared by item 
1. Further, 50 per cent of the variance for 
item 2 is specific, and 35 per cent is error 
variance. By adding item 2 to item 1, three 
per cent more of the criterion variance is ac- 
counted for; but at the same time 35 per cent 
additional error variance is introduced. Thus, 
adding item 2 would result in shrinking the 
validity of item 1. 

In a specific situation, then, the problem is 
one of selecting a number of items from a 
pool of items so that the selected items give 
a maximum relationship with the criterion. 
The Wherry-Doolittle Technique * achieves 
this goal for data where item validities and 


1 This article is based on part of a Ph.D. disserta- 
tion done under the direction of Professor E. J. Mc- 
Cormick. The dissertation is on file in the Purdue 
University library under the title “Personal Data as 
Predictors of the Job Behavior of Telephone Op- 
erators.” 

2 The author wishes to express his gratitude to the 
General Telephone Company of Michigan, Mr. F. E. 
Norris, President, whose cooperation made this study 
possible. In this regard, a special word of thanks is 
due Dr. Melvin Tieszen, formerly Personnel Director 
of the Company and now affiliated with Booz, Allen 
and Hamilton, New York. 

3 Stead, W. H., Shartle, C. L., et al. Occupational 
counseling techniques. New York: American Book 
Company, 1940, pp. 253-255. 


inter-relationships can be expressed in terms 
of coefficients of correlation. For categorical 
data, however, where item versus criterion 
relationships are expressed in 2 X 2, 2 X 3, 
2 x k contingency tables, neither item validi- 
ties nor item inter-relationships can be ex- 
pressed in correlational terms (except for those 
recorded in 2 X 2 tables). Regardless of the 
data format, the problem of shrinkage re- 
mains the same. The quartile difference 
method proposed here does essentially with 
categorical data what the Wherry-Doolittle 
Technique achieves with correlational data. 

The mechanics of this method will be illus- 
trated with four application blank items that 
were found to be related to the tenure of tele- 
phone operators at the 10 per cent significance 
level or better on the basis of analysis with a 
primary group of 171 operators. Table 1 
lists the four items that were considered to be 
significantly related to tenure, the per cent of 
high and low criterion cases in each category, 
and the scoring weights for the various cate- 
gories. In addition, the value of chi square 
for each item along with its degrees of free- 
dom and probability level are listed. 


Method 


The quartile difference method involves the 
following steps: 


1. Divide the total sample of cases into a pri- 
mary group and a holdout group. Working with 
the primary group, compute chi square for each 
item. Then, compute scoring weights for the 
various response categories of the items that are 
significantly related to the criterion. For the 
illustrative case, these results are presented in 
Table 1. 

2. List the responses of subjects in the hold- 
out group to the items which were considered to 
be significantly related to the criterion and assign 
the scoring weights as determined in Step 1 to 
these responses.‘ For example, if a subject in 


4 The step outlined here follows a cross validation 
procedure with a holdout group of 176 cases. While 
the item selection technique may be carried out with 
a single group of employees, it is strongly recom- 
mended, if it is at all feasible, that the cross valida- 
tion procedure be used. 
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Table 1 


The Four Items Related to Tenure, Per Cent of High and Low Tenure Cases in Each Response Category, Scoring 


Item Categories 





1. Height-weight ratio 
2.00-2.04 
1.70-1.99 
1.45-1.69 


. Marital status 
single 
married 


. When consult physician 
no mention 
0-9 months 
9 months + 


. Education 
below high 
high grad. 
above high grad. 


Weights, Item Chi Square with its Degrees of Freedom and Probability Level 


Tenure Tenure Weights* Square 


D.F. ?. 


4.931 2 09 


14 28 
47 . 4 
39 34 


31 
41 
28 


100 


21 12 
69 63 
10 25 


100 100 


* The scoring weights were arrived at by subtracting the per cent of low tenure cases from the per cent of 
high tenure cases for each category and adding a constant, +22, to eliminate negative weights. 


the holdout group fell in the 1.70-1.99 height- 
weight ratio category, was single, had consulted 
a physician more than nine months prior to the 
time of application and had a high school edu- 
cation, the scoring weights (taken from Table 1) 
for this subject would be listed as follows: 4, 36, 
35, 28. 

3. Select as the first item to be included in the 
battery that item which demonstrated the highest 
relationship with the criterion as determined in 
Step 1. In this case the item selected was num- 
ber 4 (Education) with a probability level of .02. 

4. List the scoring weights for the first selected 
item in order of magnitude (from high to low) 
and tally the frequencies of high criterion and 
low criterion cases in the holdout group at each 
scoring weight. Split the total distribution of 
cases at the various scoring weights into an upper 
quarter, middle half and lower quarter.5 Then, 


5 If the first selected item is dichotomous, it would 
be, of course, impossible to split the total distribu- 
tion of holdout cases at the various scoring weights 
into high quarter, middle half and low quarter since 
the cases in the holdout group are tallied at only 
two scoring weights. In situations such as this, the 
scoring weights for the first selected item should be 


compute the per cent of high criterion cases in 
the upper quarter-Q,, middle half-Q., and lower 
quarter-Q,. The difference in per cent of high 
criterion cases between the upper and lower quar- 
tiles, Q,—Q,, serves as a measure of item, or 
item combination, discrimination. These compu- 
tations for item 4 are presented as the zero order 
of analysis in Table 2. 

5. Plot the Q,, Q,, and Q, values obtained from 
the zero order analysis on a shrinkage chart, Fig- 
ure 1. The horizontal line at the 60 per cent 
point represents the per cent of high criterion 
cases in the holdout group. 

6. Combine the scoring weights for the first 
selected item with the scoring weights for each of 
the remaining items for every subject in the hold- 
out group. This procedure will result in as many 
new distributions of scoring weights as there are 
items to pair with the first selected item. List 
the combined scoring weights for each pair of 


immediately combined with the scoring weights for 
the remaining items as is indicated in Step 6 below 
In other words, the zero order of analysis is by- 
passed and the researcher goes immediately to the 
first order of analysis. 
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items in order of magnitude (from high to low) 
and tally the frequencies of high criterion and 
low criterion cases in the holdout group at each 
scoring weight for each distribution of scoring 
weights. Once again split the total distribution 
of cases at the various scoring weights for each 
item combination into upper quarter, middle 
half and lower quarter and compute the per cent 
of high criterion cases for each of these cate- 
gories. For this first order analysis the Q,, Q. 
and Q, values as computed for each pair of items 
are entered in the recording sheet, Table 2. The 
second item to be selected for the battery is that 
item which, when combined with the first se- 
lected item, yields the highest Q, — Q, value. In 
this case, the second item to be selected was item 
2 (Marital Status) which, with item 4, yielded 
the highest Q, — Q, value, namely 52. 

7. Plot the Q,, Q,, and Q, values for the best 
two items as selected by the first order analysis 
on the shrinkage chart, Figure 1. In this case, 
the Q,, Q,, and Q, values plotted for the first 
order analysis were the values obtained for items 
4 and 2 from Table 2. For the illustrative case 
examination of Figure 1, the shrinkage chart, at 
this point reveals that the addition of item 2 in- 
creases the validity of the composite as compared 
with the validity of item 4 alone (the distance 
between the Q, and Q, values continues to spread, 
indicating increased efficiency in prediction or 
item combination validity). Consequently, the 
item selection procedure is continued. 

8. Combine the composite scoring weights for 
the first two selected items with the scoring 
weights for each remaining item. Once again fol- 
low the computational procedure outlined in steps 
4 and 6 above. For this second order analysis, 
the Q,, Q.. and Q, values are computed for each 
item triad and recorded in Table 2. The third 
item to be selected for the battery is that item 
which, when combined with the first two selected 


Table 2 


Recording Sheet for Quartile Difference Method of 
Item Selection Computations 





% High Criterion 
Order of é Sc nt aM 


— Item(s) 
Item(s) Q, Qx Q, Q:-Q* Selected 


0 4 6 @2 4 38 4 


Analysis 





1 4,1 io Ba. 8 15 
4,2 88 66 36 52 
4,3 6i 58 37 24 


42,1 63 29 47 
42,3 66 63 32 34 


3 421,3 69 S8 44 25 3 


* The highest Q:—-Qs value for each order of analysis is indi- 
cated by italics. 
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100 





PER CENT HIGH CRITERION CASES 








ORDER OF ANALYS)S 


3. 1. Shrinkage chart for the Quartile Difference 
Method of item selection. 


items, yields the highest Q, — Q, value. In this 
case the third item to be selected was item 1 
(Height-Weight Ratio) which, when combined 
with items 4 and 2, yielded the highest Q, — Q, 
value, namely 47. 

9. Plot the Q,, Q, and Q, values for the best 
three items as selected by the second order analy- 
sis on the shrinkage chart, Figure 1. In this case 
the Q,, Q., and Q, values plotted for the second 
order analysis were the values obtained for items 
4, 2 and 1. Examination of the shrinkage chart 
at this point reveals that the addition of item 1 
has attenuated the per cent of high criterion cases 
in Q,, but has continued to decrease the per cent 
of high criterion cases in Q,. Since the over-all 
index of item discrimination, the Q, — Q, value, 
shows a drop of five per cent from the first to 
second order analysis, the researcher might profit- 
ably stop selecting items at this point in the se- 
lection procedure. 

The analysis in this case was continued to in- 
clude all four items. Computations for this third 
order analysis are recorded in Table 2 and plotted 
in Figure 1. Examination of Figure 1 reveals 
that the inclusion of item 3 results in further 
shrinkage (the distance between the Q, and Q, 
values decreases). In fact, the predictive effi- 
ciency of all four items appears to be less than 
that of the best single item. 

The inclusion of the shrinkage chart in the 
procedure is a refinement but by no means a 
necessity. The researcher could perhaps as effec- 
tively determine when to stop adding items by 
examining the trend in Q, —Q, values for each 
successive composite of selected items as is indi- 
cated on the recording sheet, Table 2. Trends 
in the data for all of the quartile values, how- 
ever, become more apparent with the shrinkage 
chart, and for this reason, it is probably well 
worth the additional labor needed for its con- 
struction. 
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Per cent of high tenure operators for various combinations 


of the combined marital status and education categories 


Results 


On the basis of the results provided by the 
item selection technique, two of the original 
four items that were considered to be signifi- 
cantly related to tenure (Education and Mari- 
tal Status) were chosen to compose the selec- 
tion battery. The per cent of high tenure op- 
erators in the holdout group for each com- 
bination of response categories for the two 
items are presented in Figure 2. 

It was felt that presenting these results in 
terms of combined response categories would 
be more meaningful than presenting them in 
terms of composite scoring weights. Actually, 
the listing of combined category responses 
corresponds to scoring weight magnitudes from 


high (top) to low (bottom). A rather im- 
pressive and uniform drop in per cent of high 
tenure operators occurs from the first cate- 
gory (single, below high school education) to 
the last category (married, above high school 
education). The trend, while in the right di- 
rection, is somewhat stabilized for the two 
middle categories (single, above high school 
education and married, below high school edu- 
cation). The per cent of high criterion opera- 
tors in the former is 53 and in the latter, 50. 
Chi square was computed for the contingency 
table composed of frequencies of high and low 
tenure operators at the six combinations of 
response categories in order to test the hy- 
pothesis that the relationship expressed here 
could be attributed to chance. The null hy- 
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pothesis was rejected at better than the 1% 
significance level. 


Summary 


An item selection technique for categorical 
data, the quartile difference method, was de- 
veloped to help the researcher select the most 
highly predictive combination of items from 
a pool of possible predictors. The technique 


while not completely precise (quarter splits 
have to be approximated and consequently 
affect the precision of the quartile values for 
the various analyses) does, however, provide 
a systematic procedure for the selection of 


Norman Friedman 


categorical predictors. The mechanics of the 
method were demonstrated with four items 
that were found to be related to the tenure of 
telephone operators on the basis of item 
analyses with a primary group. It was found 
that a combination of two of these items 
(Education and Marital Status) appeared to 
be more highly predictive of the criterion than 
was any other combination of items. In this 
regard, the per cent of high tenure operators 
decreases as marital status changes from single 
to married and as education increases. 


Received December 19, 1952. ¢ 
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This report describes an attempt to develop 
a brief personality scale to predict college un- 
dergraduate course grades, and particularly 
undergraduate course grades in psychology. 
The study was undertaken with the expecta- 
tion that its findings would contribute to a 
broader understanding of some of the non-in- 
tellective factors relating to academic achieve- 
ment, particularly those factors having to do 
with personal values, beliefs, and self-defini- 
tions. The construction of the scale repre- 
sents one of a series of studies devoted to the 
measurement of positive and favorable as- 
pects of personality and individual function- 
ing being carried out by the writer. The pres- 
ent scale, along with a number of earlier scales 
for such factors as social participativeness, 
dominance and leadership, social responsi- 
bility, and intellectual efficiency, is included 
as a sub-test in the California Psychological 
Inventory.® 

The first step in constructing the present 
scale was to assemble a pool of criterion-spe- 
cific personality inventory items. The writ- 
ing and selection of beginning items was based 
upon three general sources: previous find- 
ings, theories about academic motivation and 
achievement, and intuitive hunches about con- 
tributory factors. There is not space in this 
report to do more than refer to the procedures 
used in writing and selecting items, but it 
should be emphasized that a major factor in 
the possible success of any endeavor such as 
the present one is the veridicality und au- 

1 This project was carried out under a research 
grant from the National Institute of Mental Health, 


National Institutes of Health, U. S. Public Health 
Service. 

2 This paper is a revision and extension of a pre- 
liminary version given at the annual meetings of the 
American Psychological Association in Washington, 
D. C., September 1, 1952. 

8 The complete bibliography for the inventory is 
too long for inclusion here. For selected references 
see: (1), (2), (4), (5), and (6). 
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thenticity of the items themselves. No 
amount of analytical precision at some later 
time can overcome the limitations of an inept, 
superficial, or tangential pool of items. It is 
the writer’s belief that many psychological 
studies on the prediction of complex criteria 
from personality inventory data have floun- 
dered because of failure to observe this sim- 
ple, but fundamental, prerequisite. 

Four original samples were obtained for 
item analysis. These consisted of introduc- 
tory psychology classes at the University of 
California, the University of Minnesota, and 
Vanderbilt University. Each item was studied 
in at least three of the four samples, and all 
items revealing discriminatory power in each 
instance were retained. Table 1 lists five of 
the items and the basic item analysis sta- 
tistics.* 


The Items 


Altogether, 36 items ° from the pool of 150 
items were retained for the first version of the 
scale, called Hr (for honor point ratio) to dis- 
tinguish it from an Ac—high school academic 
achievement—scale developed earlier by the 


4 These samples were very kindly made available 
by Drs. John Gustad, Rheem Jarrett, and Miles A. 
Tinker. 

5 A longer version of Table 1 giving the item per- 
centages and significance tests for the complete scale 
has been deposited with the American Documentation 
Institute. Order Document 3947 from the ADI Aux- 
iliary Publications Project, Photoduplication Service, 
Library of Congress, Washington 25, D. C., remitting 
$1.25 for microfilm (images 1 inch high on standard 
35 mm. motion picture film) or $1.25 for photoprint 
readable without optical aid 

6 Sixteen of these 36 items were taken, by per- 
mission, from the Minnesota Multiphasic Personality 
Inventory. (Hathaway, S. R., and McKinley, J. C. 
The Minnesota Multiphasic Personality Inventory. 
Minneapolis: University of Minnesota Press, 1943.) 
The 16 items in the MMPI and the scored responses 
are as follows: 33F, 78T, 122T, 157F, 248F, 250F, 
2(OF, 287F, 295T, 313F, 395F, 437F, 448F, 469F, 
492F, 498F. 
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Table 1 


Sample Items from the Hr (Honor Point Ratio) Scale Distinguishing between Students with 


Higher and Lower Course Grades 


Proportion in Each Sample Saying “True” 


California 


Class 


Higher 


Item (N= 50) 


Lower 
(N= 50) 


Minnesota 
Class 

Higher 

(N = 40) 


Vanderbilt 
Class 


Higher Lower 
(N=20) (N=20) 


Lower 
(N=40) 





. Lawbreakers are almost al- 44 
ways caught and punished. 

. For most questions there is 
just one right answer, once a 
person is able to get all the 
facts. 

. It is annoying to listen to a 
lecturer who cannot seem to 
make up his mind as to what 
he really believes. 

. The future is too uncertain 
for a person to make serious 
plans. 

. Teachers often expect too 
much work from the stu- 
dents. 


22 


62 


46 


48 58 25 50 


30 38 20 ~ 





writer (4). The 36 items, and the responses 


predictive of higher grades are given below: 


1. I have had very peculiar and strange experi- 
ences. (F). 2. I have very few fears compared 
to my friends. (F). 3. I usually take an active 
part in the entertainment at parties. (F). 4. It 
is always a good thing to be frank. (F). 5. I 
don’t blame anyone for trying to grab all he can 
get in this world. (F). 6. I was a slow learner 
in school. (F). 

7. Sometimes without any reason or even when 
things are going wrong I feel excitedly happy, “on 
top of the world.” (F). 8. Parents are much too 
easy on their children nowadays. (F). 9. Teach- 
ers often expect too much work from the stu- 
dents. (F). 10. I think I would like to fight in 
a boxing match sometime. (F). 11. I have often 
found people jealous of my good ideas, just be- 
cause they had not thought of them first. (F). 
12. People pretend to care more about one an- 
other than they really do. (F). 

13. The future is too uncertain for a person to 
make serious plans. (F). 14. The man who pro- 
vides temptation by leaving valuable property 
unprotected is about as much to blame for its 
theft as the one who steals it. (F). 15. I dread 
the thought of an earthquake. (F). 16. I am 
bothered by people outside, on streetcars, in 
stores, etc., watching me. (F). 17. I feel that 
I have often been punished without cause. (F). 


18. I seem to be about as capable and smart as 
most others around me. (T). 

19. I like poetry. (T). 20. It is annoying to 
listen to a lecturer who cannot seem to make 
up his mind as to what he really believes. (F). 
21. I like to plan a home study schedule and then 
follow it. (F). 22. Our thinking would be a lot 
better off if we would just forget about words 
like “probably,” “approximately,”’ and “perhaps.” 
(F). 23. For most questions there is just one 
right answer, once a person is able to get all the 
facts. (F). 24. It is all right to get around the 
law if you don’t actually break it. (F). 

25. I often lose my temper. (F). 26. I some- 
times feel that I am a burden to others. (F). 27. 
I looked up to my father as an ideal man. (F). 
28. Law-breakers are almost always caught and 
punished. (F). 29. I liked “Alice in Wonder- 
land” by Lewis Carroll. (T). 30. I have a tend- 
ency to give up easily when I meet difficult prob- 
lems. (F). 

31. The trouble with many people is that they 
don’t take things seriously enough. (F). 32. 
Only a fool would try to change our American 
way of life. (F). 33. Even when I do sit down 
to study it is hard to keep my mind on the as- 
signment. (F). 34. It is often hard for me to 
understand what the questions are driving at in 
a school test. (F). 35. I] have to wait for the 
right mood before I can sit down and study. (F). 
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Table 2 


Summary Statistics for the Original Samples 
on the Hr Scale 


r 
with 

Course 

SD Grades 


. Introductory psychology 180 15.7 2.6 42 
class at California, 
June, 1950.* 

. Introductory psychology 67 
class at Minnesota, 
August, 1949,** 

. Introductory experimental 270 
psychology class at 
Minnesota, October, 1950. 

. Introductory psychology 86 
class at Vanderbilt, 
October, 1950. 


Sample N M 


24.9 4.0 


21.9 44 





* Took only 24 of the 36 items in the full scale. 
** Took only 12 of the 36 items in the full scale. 


36. I plan very carefully about which school 
courses I will take. (T). 


Results 


This Hr scale was correlated with course 
grades in the original four samples, totalling 
603 cases, with the results indicated in Table 
2. The median r is .47, and the mean 7, using 
the z-transformation, is .48. 


Table 3 


Summary Statistics for the Cross Validating Samples 
Given the Full 36-Item Hr Scale 


r 
with 
Course 
Sample N 


. Introductory psychology 121 
class at California, 
March, 1951. 

. Introductory psychology 
class at California, 
June, 1951. 

. Introductory psychology 
class at California, 
August, 1951. 

. Elementary statistics 
class at California, 
October, 1950. 





Table 4 


Summary Statistics for the Cross-Validating Samples 
Given the 32-Item Version of the Hr Scale 
Included in the California Psy- 
chological Inventory 


r 
with 

Course 

SD Grades 


Sample N M 
21.3 40 31 


1. Introductory psychology 


class at California, 
October, 1951. 

. Introductory psychology 
class at Stanford, 
March, 1952. 

. Introductory psychology 
class at California, 
April, 1952. 

. Introductory psychology 
class at California, 

July, 1952. 

. Introductory psychology 
class at California 
(Santa Barbara cam- 
pus), December, 1952. 

. Upper division psychology 139 
class at California, 
April, 1952. 

. Upper division psychology 29 
class at California, 

July, 1952. 


Four cross-validating samples, totalling 336 
cases, were given the initial 36-item scale.’ 
Table 3 presents the findings. The median r 
here is .33, and the mean ¢, using the z-trans- 
formation, is .38. 

The original 36-item Hr scale contained 
four items pertaining to present attendance in 
school (the last four items in the list above). 
These items were eliminated in the 32-item 
version of the Hr scale included in the Cali- 
fornia Psychological Inventory. This inven- 
tory was given to seven additional college 
samples to obtain cross-validational informa- 
tion on the 32-item scale, when included in a 
large constellation of items.* 

* These samples were made available through the 
courtesy of Drs. W. Brown, J. McKee, L. Postman, 
and R. Tryon. 

8 These samples were made available through the 


courtesy of Drs. W. Brown, J. Clark, P. Farnsworth, 
D. Krech, D. MacKinnon, and D. Riley 
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Table 5 


Correlation of the 32-Item California Psychological 
Inventory Version of the Hr Scale with 
High School Grade Average 








High School N M 


1. Butler, Pennsylvania 397 15.6 
2. Clarksdale, Mississippi 77 14.3 
3. Franklin, Pennsylvania 108 15.8 
4. Mt. Vernon, Washington 107 16.5 
5. Rock Island, Illinois 224 «15.1 
6. St. Cloud, Minnesota 195 14.7 








Table 4 presents these data. The total 
number of cases is 917, and the median r = 
32. The mean ¢, using the z-transformation, 
is again .38. 

Because the California Psychological Inven- 
tory is designed to be used in high school as 
well as in college settings, the efficacy of the 
Hr scale in predicting high school over-all 
grade averages was determined. Table 5 pre- 
sents these data. The total N is 1,108, the 
median r = .38, and the mean r = .36. 

The Hr scale, along with a wide variety of 
other tests, was also given to a sample of 
40 senior medical students seen at the Uni- 
versity of California Institute of Personality 
Assessment and Research in an intensive as- 
sessment program.'® Some of the more promi- 
nent findings are presented in Table 6. 

Perhaps the most important observation 
here is that the Hr scale correlates with cri- 
terion ratings of achievement in medicine as 
well as it does with undergraduate course 
grades in psychology. Furthermore, its pat- 
tern of correlation with the other variables 
listed is uniformly favorable, with the possible 
exception of the staff rating on impulsivity. 

One of the questions which might now be 
raised is whether the Hr scale is assessing any 
independent achievement variance, or whether 
it is primarily an indirect measure of intellect. 
Table 7 affords evidence relevant to this query. 


® These samples were made available through the 
kindness of Mr. C. O. Austin, Mrs. M. S. Gleason, 
Mr. G. N. Harriger, Mr. H. B. Heidelberg, Mr. R. H. 
Sorenson, and Mr. R. F. Wilson. 

10 The research at the Institute of Personality As- 
sessment and Research is being conducted under a 
grant from the Rockefeller Foundation. See refer- 
ence (3) for a discussion of the work of this Insti- 
tute. 


The six correlations with IQ in the high 
school samples are all lower than they are for 
Hr vs. grade averages, and a similar difference 
obtains for the college sample. In the mili- 
tary sample of 150 cases Hr correlates only 
.10 with intellect, but .50 with a measure of 
scholastic achievement. The mean r with the 
intellectual variables in Table 7 is .26, and 
with the indices of achievement is .38. 

If these values are taken as reasonable ap- 
proximations of the true parameter values, an 
estimate of the multiple R between IQ, Hr, 
and scholastic achievement can easily be made. 
For the typical value of .50 between IQ and 
grades, the multiple R would be .57, for a 
value of .60 the multiple R would be .64, and 
so on. Hr would thus appear to be a partial 


Table 6 


Correlation of the Hr Scale with a Variety of Measures 
and Assessment Variables in a Sample of 40 
University of California Senior 

Medical Students 


Variables r 





. Medical faculty criterion ratings. 
a. Potential success ’ 31 
b. Originality 33 
. Assessment staff ratings. 
a. Personal tempo 
b. Breadth of interests 
c. Vitality 
d. Impulsivity 
e. Verbal fluency 
f. Originality 
g. Positive affect 
h. Rigidity 
. Ratings of performance in improvisations. 
a. Dominance 
b. Flexibility 
c. Ingenuity 
. Ratings of performance in charades. 
a. Motility 
b. Over-all effectiveness 
c. Perseveration 
d. Self-consciousness 
. Perceptual-cognitive variables. 
a. Size constancy estimation (near and far 
triangles, smallness of error in judging) 
b. Luminous tilted square, total error in 
adjusting inner line to upright 
. Street Gestalt pictures, accuracy of rec- 
ognition 








A Personality Scale to Predict Scholastic Achievement 


Table 7 


Comparative Correlations between Hr and the Intellectual and Achievement Variables Indicated 





Sample 





Correlation of Hr with 


Intellectual Achievement 
Variable* Variable** 





. High Schools 
. Butler, Pennsylvania 
. Clarksdale, Mississippi 
. Franklin, Pennsylvania 
. Mt. Vernon, Washington 
. Rock Island, Illinois 
». St. Cloud, Minnesota 
. College 
1. University of California, Santa 
Barbara, psychology class 
Other 
1. Military Officers 


Il. 


33 38 
10 .26 
37 38 
30 35 
Be 42 
33 39 


.22 32 


150 .10 50 





* In the high school samples, standard group tests of intelligence were used. 


In the college sample the cri- 


terion was the Altus Measure of Verbal Aptitude, and in the military sample the Thurstone Primary Mental 


Abilities Test. 


** In the high school samples the criterion was the over-all high school grade average, in the college sample 
the course grade in psychology, and in the military sample the USAFI Test of General Educational Develop- 


ment, Reading Comprehension in the Social Sciences. 


Table 8 


Correlation of the Hr Scale with Other Scales from 
the California Psychological Inventory, in a 
Nationwide High School Sample* 


CPI Scale Females Males 
. Re (responsibility) 55 46 
. To (tolerance) 16 70 
. Fl (flexibility) 31 38 
. St (status) 53 48 
. Do (dominance) 31 .24 
Sp (social participation) 38 19 
. Fe (femininity) A O04 
. De (delinquency) .27 | 
. Te (intellectual efficiency) é B 
. Ac (academic achievement. 
high school) a 50 
. Py (psychological interests) 46 
. Ip (academic motivation, 
graduate school) J 29 
. Ne (neurodermatitis) ‘ .26 
. X1 (poise and spontaneity) 33 27 
. X2 (impulsivity and self- 
centeredness) AO 44 
. In (infrequency) 07 03 
. Gi (good impression) 39 37 
. Ds (dissimulation) —.52 —.46 


NAME Wh = 





* 2,423 females, 2,077 males, from 16 high schools 
in 13 states. 


predictor of academic outcomes in its own 
right without drawing to any great extent on 
intellectual factors, and can also add slightly 
to the multiple R prediction of grades from 
measures of intelligence. 

The intercorrelations of Hr with the 18 
other scales on the CPI are presented in Ta- 
ble 8. The highest relationships are with the 
scales for tolerance, flexibility, intellectual effi- 
ciency, and psychological interests. 

The final information presented in this pa- 
per has to do with the social psychological im- 
plications of higher and lower scores on the 
Hr scale. In the research program at the 
Institute of Personality Assessment and Re- 
search previously referred to, each staff mem- 
ber filled in a Gough Adjective Check List (3) 
about each assessee. For some of the analyses 
these observers’ reports were composited into 
a single “general observer’s” report by con- 
sidering each adjective checked by at least 2 
out of 6 senior staff members as being ‘“pres- 
ent,” and as being “absent” if checked by 
only one, or by none. 

These composited adjective check lists were 
used to carry out an analysis of the social 
stimulus values of the Hr scale. Two sam- 
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ples of 30 each were drawn by selecting the 
10 highest and 10 lowest subjects on the Hr 
scale from two graduate student samples of 
40 each, and from the sample of 40 medical 
school seniors already mentioned. A study 
was then made of what observers did, in fact, 
say about the 30 highest ranking students, as 
compared with what they did, in fact, say 
about the 30 lowest ranking students. The 
adjectives showing statistically significant dif- 
ferentiations are listed below: 


I. Adjectives checked more frequently about 
higher-scoring subjects on the Hr scale. 


determined 
efficient 
fore-sighted 
honest 
industrious 
intelligent 
interests wide 
logical 
organized 


adaptable 
alert 
ambitious 
appreciative 
capable 
clear-thinking 
conscientious 
cooperative 
dependable 


persevering 
planful 
pleasant 
rational 
reasonable 
realistic 
reliable 
responsible 
resourceful 


. Adjectives checked more frequently about 
low-scoring subjects on the Hr scale. 


sentimental 
shy 
wary 


cautious 
dissatisfied 
dull 
immature 


nervous 
preoccupied 
rebellious 
rigid 


The patterning of these adjectives is very 
consistent. ‘Highs’ are seen as alert, clear- 
thinking, efficient, intelligent, pleasant, and 
resourceful. “Lows” are seen as dull, imma- 
ture, rebellious, rigid, and wary. The staff 
raters, of course, had no information whatso- 
ever about the Hr scores of these subjects. 


Summary 


A personality scale to predict undergradu- 
ate grades was developed. A mean rf with 
course grades of .38 in eleven cross-validat- 
ing college samples totalling 1,253 cases was 
attained. The Hr scale also predicted high 


Harrison G. Gough 


school grades, giving a mean r of .36 in six 
high school samples totalling 1,108 cases. 

Evicence from eight samples, including 
1,362 cases, was adduced to support the claim 
that the Hr scale is a predictor of academic 
achievement and not simply an indirect and 
inefficient measure of intellect. In these sam- 
ples the mean correlation of Hr with measures 
of intellect was .26, and with indices of aca- 
demic achievement was .38. 

Additional findings in a sample of 40 senior 
medical students revealed a significant cor- 
relation between Hr and ratings of success in 
medical training, and between Hr and a num- 
ber of assessment variables such as breadth 
of interests, originality, flexibility, vitality, ef- 
fectiveness in group discussion and in cha- 
rades, and adequacy of performance on‘ per- 
ceptual-cognitive tasks involving complex 
judgmental decisions. 

The final section of the paper listed some 
of the more prominent social-interactional im- 
plications of higher and lower scores on the 
Hr scale. High scorers tend to be seen as ca- 
pable, intelligent, and reliable and low scorers 
as dissatisfied, dull, rigid, and shy. 


Received November 20, 1952. 
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This paper reports the comparative scores 
made on the Kuder Preference Record by sam- 
ples of graduates of the Indiana University 
Schools of Medicine, Law and Business who 
had graduated in 1941 or previously. In 
1950, the Preference Record (Form C) was 
sent to 996 graduates of the School of Busi- 
ness, 764 graduates of the School of Law and 
992 graduates of the School of Medicine. Re- 
turns were received from 313 for Business, 
210 for Law and 242 for Medicine. The 
mean ages of the respondents were 37.5 years 
for Business, 45.2 for Law and 45.2 for Medi- 
cine. Each return indicated the individual’s 
present occupation. Strixing and significant 
differences have been found among the inter- 
est patterns of the various groups studied. 

The first comparison presented is between 
the interests of doctors and those of lawyers, 
accountants, and other Business School gradu- 
ates. Table 1 gives these data. Since ac- 
countants differ so markedly from other busi- 
ness groups they have been kept separate in 
the table. It should be noted. too, that only 


practicing lawyers from the law school are re- 
ported in this table. The data from non- 
lawyers are presented later. 

In this and the following tables the mean 
raw scores reported in the first line are taken 
as the basis for comparison. The ¢-test was 
used to determine the significance of differ- 
ences of means from the base group. 

The standard deviations of all 
studied are reported in Table 3. 


groups 


Results 


Inspection of Table 1 reveals that doctors 
had scores significantly different at the 1% 
level from lawyers on one of the ten scales, 
and from the business groups on seven and 
eight of the scales. As a general pattern, 
when compared to the other groups, doctors 
were higher at the 1% level of significance on 
the scientific, social service, artistic and out- 
door scales and lower at the same level of sig- 
nificance on the computational, persuasive, 
and clerical scales. 


Table 1 


Mean Interest Scores of Lawyers and Businessmen Compared with Those of Doctors 


Out- 
door 


Mech. Comp. 
47.7 


409 23.9 


Med. Sch. Grads 
(N = 242) 
Practicing Lawyers 
(N = 148) 
Accountants 
(N = 44) 
Bus. Grads excl. 
Accts. (N = 269) 


40.1t 33.2t  27.8** 


39.8 42.1** 


37.2 37.7t 29.8** 





level of confidence. 
level of confidence. 


** Significantly higher at the 1% 
* Significantly higher at the 5% 
} Significantly lower at the 1% level of confidence. 
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Sci. 
49.6 
36.6 


35.8t 


34.0t 


Soc. 
Serv. 
45.1 


Cleri- 
cal 


37.5 


Art. 
24.5 


Mus. 
12.8 


13.7 a2" De 


39.6** 12.0 32.73 @.3” 


Se" 9s 14.1* 39.2} 50.3** 





Robert H. Shaffer and G. Frederic Kuder 


Table 2 


Mean Interest Scores of Accountants and Other Business School Graduates 
Compared with Those of Lawyers 








Out- 
door Mech. Comp. Sci. 


Pers. Art. Lit. Mus. 





Practicing Lawyers 40.1 33.2 27.8 
(N = 148) 
Accountants 
(N = 44) 
Bus. Grads excl. 


Accts. (N = 269) 


38.0 39.8°*  42.1** 


37.2¢ 37.7** = 29.8* 


36.6 
35.8 


34.0 


41.6 26.5 13.7 


39.6 23.0 12.0 


a 21.8f 14.1 





** Significantly higher at the 1% level of confidence. 
* Significantly higher at the 5% level of confidence. 
ar some dl lower at the 1% level of confidence. 

Significantly lower at the 5% level of confidence. 


Table 2 gives the data resulting from a 
comparison of the interest scores of lawyers 
to the two business groups. As a general pat- 
tern the lawyers were significantly higher than 
businessmen other than accountants in the 
literary and scientific areas and lower in the 
persuasive and mechanical areas. The com- 
parison with accountants differed from this 
pattern. The lawyers had lower computa- 


tional, clerical and mechanical scores, and 
higher social service and literary scores than 


accountants. The comparison of lawyers with 
physicians was noted in the discussion of Ta- 
blel. ° 

It is interesting also to note differences be- 
tween subdivisions of the graduates of the 
various schools. Table 3 gives these data. 
The medical school graduates were scattered 
among a number of specialties, but some did 
not report enough detail to allow a more spe- 
cific classification than that of physician. 
However, there were enough who could be 
classified specifically as surgeons and phy- 
sicians-in-general-practice to justify a com- 
parison of the scores from the two groups. 
The most significant difference between these 
groups is on the social service scale, the phy- 
sicians-in-general-practice being significantly 
higher within the 1% level of confidence. 
They are also higher on the scientific scale 
but only at the 5% level. A trend well with- 
in the 10% level of confidence may also be 
noted for surgeons to be higher on the me- 
chanical scale. 

Although it appears that the graduates of 
the medical and business schools stay in these 
fields, this generalization does not hold for 


the law graduates. Perhaps the distinction 
between law and business is the result of the 
terminology used, since “business” covers a 
tremendously wide range of activities. A per- 
son could change his occupation greatly and 
still be in the field of business. At any rate, 
of the law school graduates responding, 29.5% 
reported they were in occupations other than 
law. Most of these are in related fields often 
involving managerial or administration work 
in business and industry where they presum- 
ably have occasion to apply their training in 
law. Those who are not actually practicing 
law are significantly and perhaps surprisingly 
higher on the persuasive scale. This is the 
only significant difference noted between the 
two groups, as shown in Table 3. 

The graduates of the business school are in 
a wide variety of occupations and except for 
the accountants the occupational groups were 
too small to justify a breakdown analysis. 
Hence the scores of the accountants are com- 
pared with those of the remaining graduates 
of the business school. The results are re- 
ported in Table 3. As might be expected, 
the accountants are significantly higher in the 
computational and clerical areas. A negative 
difference of lesser significance may also be 
noted on the musical scale. A large propor- 
tion of the non-accountant group is in man- 
agerial or sales occupations. These results 
are consistent with those previously reported ' 
for senior men in the I. U. School of Busi- 
ness. Senior accounting majors were found 


1 Shaffer, R. H. Kuder interest patterns of uni- 
versity business school seniors. J. appl. Psychol, 
1949, 33, 489-493. 
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Table 3 


Mean Interest Scores of Sub-Groupings of Doctors, Lawyers and Businessmen 








Group Outdoor Mech. Pers. Art. Lit. Mus. 


Med. School 
Graduates 
N = 242t 


Surgeons 

N = 50 
Physicians-in- 
Gen’l-Pract. 
N = 66 


Law School Grads. 
Lawyers 
N = 148 


Non-Lawyers 
N = 62 


Business School 
Grads other 
than Accts. 


N = 269 


Comp. 





M 
SD 
M 
SD 


47.7 
14.3 


51.4 
11.8 


40.9 
11.8 


44.5 
12.5 


23.9 
8.3 


22.1 
8.8 


30.7 24.5 


8.9 


24.5 
8.6 


21.8 
7.1 
22.7 
6.7 


12.8 
6.5 


13.4 
6.9 


M 
SD 


48.6 
14.0 


40.6 
10.8 


24.7 23.9 


8.4 


20.4 11.0 


6.2 


M 
SD 
M 
SD 


40.1 
14.7 


42.2 
15.6 


33.2 
13.1 


34.4 
13.1 


20.2 
8.6 
20.0 
8.4 


M 
SD 


; 29.8 
é 9.9 


42.1** 


34.0 
10.3 


35.8 


51.1 
15.4 


39.6t 


39.2 
11.9 


32.7} 


50.3 
13.1 


60.3** 


19.6 
8.5 


17.5 


14.1 
6.2 


12.08 


37.2 3 
14.0 1 
38.0 


Accountants M 39.8 


N = 44 SD 14.4 12.8 8.0 10.1 


14.3 91 5.7 11.6 


12.2 





t As indicated, 242 


Medical School graduates returned questionnaires. A large number of respondents 


merely stated they were “doctors” instead of indicating actual type of their practice. To prevent erroneous 
grouping, the returns of only those who indicated they were in “general practice” or “surgeon” were used for 


statistical comparisons. 
cians-in-general-practice. 
** Significantly higher at the 1% level of confidence. 
* Significantly higher at the 5% level of confidence. 

t Significantly lower at the 1% level of confidence. 

§ Significantly lower at the 5% level of confidence. 


to have significant differences in all nine scales 
of the Kuder Form B when compared to all 
business seniors. 


Summary 


The Kuder Preference Record (Form C) 
was given to a sampling of the 1941 or prior 
graduates of the Indiana University Schools 
of Medicine, Law, and Business. The mean 
raw scores of these groups and sub-groups 
were compared with the following results: 

1. Significantly different interest patterns 
were found for doctors, lawyers, and business- 
men. 

2. In general doctors were higher than the 
other groups on the social service, scientific, 
artistic and outdoor scales, and lower on the 
computational, persuasive, and clerical scales. 

3. Lawyers compared to businessmen other 
than accountants were higher in the literary 


The ¢-test for Med. School graduates was confined to comparison of surgeons and physi- 


and scientific areas and lower in the per- 
suasive and mechanical areas. The compari- 
son with accountants revealed a different gen- 
eral pattern. The lawyers had lower com- 
putational, clerical, and mechanical scores, 
and higher social service and literary interest 
than accountants. 

4. Physicians-in-general-practice were found 
to be higher on the social service and scientific 
scales than surgeons. There is also a trend 
of less statistical significance for surgeons to 
be higher on the mechanical scale. 

5. The graduates of the law school who are 
not practicing law were found to have a higher 
average persuasive score than the practicing 
lawyers. 

6. Accountants had higher computational 
and clerical scores than other business groups, 
and lower social service and persuasive scores. 


Received November 10, 1952. 
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A time-honored means of selecting candi- 
dates for a given profession or vocation has 
been the use of the interest or preference in- 
ventory. In the opinion of many investiga- 
tors, interest patterns are more indicative in 
such selection than are the data afforded by 
personality schedules and measures of apti- 
tude now in use. In reporting a comparative 
study of students preparing for five selected 
professions, Blum states (2): “It is significant 
that the greatest differences . . . were in their 
vocational and non-vocational interest tend- 
encies rather than in personality traits. . . .” 
Triggs (3), drawing upon her wide experi- 
ence in counseling individual nurses, makes 
the observation that of those students who 
fail or withdraw from the nursing curriculum, 
the most common finding is deviation in 
scores on the interest inventory; in no other 
respect is she so likely to deviate from the 
usual pattern of scores made by the suc- 
cessful nurse. Using the Kuder Preference 
Record with a group of nurses and a group 
of women-in-general, Triggs found that it did 
an excellent job of differentiation. The writer 
has attempted a similar study with student 
nurses and liberal arts college girls with an 
education major. 


The Present Study 


The experimental group consisted of 80 
students in Knapp College of Nursing in 
Santa Barbara, California, ranging in age 
from 17 to 25 years, all Caucasians with the 
exception of one Japanese girl. Matched in- 
dividually for sex, age, percentile on ACE, 
and race, the control group of 50 girls was 
selected from liberal arts college students 
with an education major enrolled in the Uni- 
versity of California, Santa Barbara College. 
Table 1 presents these data. The Kuder 
Preference Record was administered to the 
student nurses as a part of the battery of 
qualifying tests given prior to admission to 
the Knapp College of Nursing. The educa- 


tion majors took the inventory on request, as 
one of a short battery of tests. 

Table 2' gives the mean percentile scores 
for each of the nine scales of the Kuder 
Preference Record for both the experimental 
and the control groups. Also given are the 
sigmas for each scale, the standard deviation 
of the means, the sigmas of the difference and 
the critical ratios of the difference. Four of 
the nine scales yield critical ratios at the .01 


Table 1 


Matching Variables, Experimental and Control Groups 





Control 
Group 


N = 80 N = 50 


Age, Mean 18.7 18.8 
Age, SD 1.5 1.7 
ACE, Mean percentile 34.4 35.1 
ACE, SD 22.6 22.7 


Experimental 
Group 
Variables 








level of confidence. Science, with a mean 
score of 64.6 for the nurses and a mean score 
of 46.4 for the education majors, has a ¢ 
value of 8.21. Furthermore, the Persuasive, 
Literary, and Social Service scales also yield 
highly significant differences between the 
groups, the respective ¢ values being 4.97, 
4.17 and 4.94. Figure 1 presents graphically 
the means of the two groups for all scales of 
the Kuder Preference Record. Ranked from 
highest to lowest CR value, the four scales 
which are least significant in differentiating | 
the groups are Musical, Artistic, Mechanical 
and Clerical. 

A less conventional type of analysis of 
the Kuder Preference Record was also under- 
taken. The “most” and “least” choices for 
each item in the clusters of three in all twelve 


1Tables 2 and 3 have been deposited with the 
ADI Auxiliary Publications Project. Order Docu- 
ment No. 4068 from ADI Auxiliary Publications 
Project, Chief, Photoduplication Service, Library of 
Congress, Washington 25, D. C., remitting $2.50 for 
photocopies (6 X 8 inches) or $1.75 for microfilm 
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Fic. 1. Comparison of mean percentile scores of 
80 cadet nurses and 50 college freshmen on 9 cate- 
gories, Kuder Preference Test. Significant at the .01 
level = science, CR 8.21; persuasion, CR 4.97; litera- 
ture, CR 4.17; and social service, CR 4.94. 


columns of the answer booklet were computed 
for both the experimental and the control 
groups. The CR of the percentage of the 
difference for the items was then computed. 
A total of 76 items were found to be sig- 
nificant at the .01 level of confidence. This 
number included 40 items in which the choice 
of the item as “most” served as the basis for 


the differentiation of the groups. Another 
16 items were found in which the choice 
“least” by the groups served as the basis for 
identification. In still another 20 items, 
either choice, “most” or “least,” yielded ¢ 
values at the .01 level of confidence. In sum, 
then, there were found 76 items which permit 
96 choices which appear to be valid for dif- 
ferentiating the student nurse from the liberal 
arts college education major. The ¢ values 
for the 96 choices ranged from 2.62 to 8.22. 
The highest value was obtained for the item 
“Be a chemist.” It is checked as a “most” 
choice by the nurses in the cluster that also 
includes “Be a machinist” and “Be an archi- 
tect.” Table 4 lists samples of the 76 items.* 

Further examination of these choices in 
terms of their meaning to the student nurse 
or education major gives evidence of a ra- 
tional basis for most of the items. The stu- 
dent nurse is or is expected to be vitally in- 
terested in science, chemistry, working in a 
laboratory and in its equipment, the discovery 
of cures, and the care of sick people. When 
an item of this type is checked as a “most” 
choice by the student nurse, it usually has a 
high ¢ value; that is, it distinguishes the stu- 
dent nurse from the education major because 
members of the latter group seldom check 
such items as a “most” choice. The educa- 


2 Table 3 in its entirety is deposited with ADI 
See footnote 1. 


Table 4 


Samples of the Significance of the Difference Between the Responses of 80 Student Nurses and 50 College 
Women in Education Curricula to ‘Most’ and “Least”? Choices on the Kuder Record 


Item Marking 
Value and Group 
of Diff. Involved 


4.44 “most,”’ Educ. 
5.19 “most,”’ Nurses 


6.18 “most,’’ Nurses 
5.77 “least,” Educ. 
4.40 “most,’”? Educ. 
3.18 “least,”’ Nurses 


“most,” Nurses 


Item Triads 


Take a course in sketching. 
Take a course in biology. 
Take a course in metal working. 


Do chemical research. 
Do chemical research. 
Interview applicants for employment. 
Write feature stories for a newspaper. 


Write a political campaign song. 
Write an article on how machine tools are made. 
Design a computing machine. 
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tion major prefers teaching children, writing 
a best seller, being a journalist, scoring ex- 
aminations as a means of earning pin money 
and interviewing people in a survey of public 
opinion, to name a few of the interests which 
her choices reveal. What does the nurse want 
“least” to do? It will be recalled that the 
nurse scored low in the Persuasive area. She 
has no interest in selling nor is she interested 
in writing a newspaper column. Being a 
journalist or a literary critic or a famous radio 
commentator has no appeal for her. The 
college education major, on the other hand, 
looks with disfavor upon work in chemistry, 
anything to do with a laboratory or research 
equipment, and anything associated with a 
hospital.® 

A few seemingly odd choices on the part of 
both groups require interpretation. A “most” 
choice of the education major, the ¢ score 
rating of which is second in magnitude to all 
the ratings, seems highly peculiar in the light 
of what the writer believes to be her interests. 
The item is “Be an architect.” This is one 


of the cluster mentioned previously in which 
the “most” choice of the student nurse is “Be 
The education major obviously 


a chemist.” 
does not want to be a chemist; nor is she in- 
terested in being a machinist. She is thus put 
in the position of making a forced choice, and 
checks the least offensive item, which is for 
her “Be an architect.” For this same cluster 
the “least” choice of the education major is 
“Be a chemist.” The critical ratio of the dif- 
ference for this “least” choice is 3.04. Other 
possible forced choice answers may be cited. 
The item “Sell musical instruments” is checked 
as a “most” item by the education majors. 
With a ¢ rating of 5.26 the choice is contrasted 
with the nurse’s choice of “Help in a sick 
room.” Neither wishes to “Repair household 
appliances.” Another example of a forced 
choice, “Design a computing machine” is 
checked as a “most” choice by the student 
nurses in preference to “Write a political 
campaign song” or “Write an article on how 
machine tools are made.” 

In a previous study by the writer (1) of 

8 These students are working toward certification 


in the fields of early childhood education, elementary 
education, and physical education at secondary levels. 
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responses to the MMPI made by student 
nurses and a matching group of college educa- 
tion majors, 66 items were singled out which 
differentiated the groups at the .05 level of 
confidence or better. Of these items, 23 were 
found to be significant at the .01 level. On 
the basis of an analysis of these items, pre- 
sumptive evidence of personality attributes 
possessed by the student nurse was presented. 
A point of interest in the present investigation 
was the possibility of parallel findings or re- 
lated data in the patterns of response of the 
student nurse to two inventories designed to 
explore different facets of personality. Find- 
ings are largely negative. Only one striking 
parallel exists. The two items with the high- 
est ¢ ratings on the previous study were “I 
like science” and “I like to read about 
science” respectively. The two items with 
highest ¢ ratings in the “most” choices of the 
student nurse on the Kuder Preference Record 
are “Be a chemist” and “Give popular lec- 
tures on chemistry.” As stated previously, 
one can readily infer a logical basis for the 
nurse’s choice. 

Triggs (3) obtained Preference Record 
scores on 826 graduate nurses and compared 
them with the scores of 1246 women-in-gen- 
eral. She found that the interests of nurses 
differed significantly at the .01 level from 
women-in-general on all scales of the Prefer- 
ence Record except on the Artistic, where the 
difference was found to be significant at the 
.05 level only, and on the Mechanical, where 
no significant difference was found. Listed in 
the order of magnitude, the scales showing 
positive magnitude were Social Service, 
Science, Artistic and Musical. Those show- 
ing negative differences were Persuasive, 
Clerical, Computational and Literary scales. 
Not closely related to the present study but 
of interest is another study by Triggs (4) in 
which she used the scores of the Kuder Prefer- 
ence Record to determine whether different 
interest patterns exist in specialized fields of 
nursing. She reports reliable differences; for 
example, the Public Health nurse makes sig- 
nificantly higher scores on the Persuasive and 
Social Service scales, and significantly lower 
scores on the Computational and Clerical 
scales. 
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Summary and Conclusions 


1. An investigation into the interest pat- 
terns of student nurses as contrasted with stu- 
dents majoring in education curricula in a 
liberal arts college utilized the responses of 
the respective groups on the Kuder Preference 
Record. A total of 80 Knapp College of 
Nursing students were matched for race, age, 
and percentile on ACE with 50 education 
majors from the University of California, 
Santa Barbara College. 

2. Mean scores of the two groups in four 
of the interest areas yielded critical ratios of 
the difference at the .01 level of confidence. 
The Science scale with a mean score of 64.6 
for the nurses and 46.4 for the education 
majors has the highest ¢ value of 8.21. The 
areas Persuasive, Literary and Social Service 
yielded respective ¢ values of 4.97, 4.17 and 
4.94. 

3. Another form of analysis was attempted 
in order to identify items in the clusters of 
three in which “most” and “least” choices 
showed a valid difference for the experimental 
and control groups. A total of 96 choices 
made in response to 76 items were found to 
be significant at the .01 level of confidence. 
Of these, 60 were “most” choices and 36 
“least” choices. The ¢ values for the in- 
dividual items varied from a high of 8.22 to 
a low of 2.62. 

4. The student nurse by reason of her 
“most” choices manifests interest in or prefer- 
ence for science, anything pertaining to the 
laboratory, the discovery of cures, and the 
care of the sick. The college education major, 


3. Triggs, 
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on the other hand, anticipates a liking for 
teaching children, is interested in various 
forms of writing, and in interviewing people 
for public opinion surveys. The “least” 
choices of the student nurse are heavily 
weighted with pursuits which require per- 
suasion or selling, any form of writing, re- 
porting or literary criticism. The education 
major’s “least” choices indicate antipathy for 
work in chemistry, anything that has to do 
with laboratory or research equipment and 
anything associated with a hospital. 

5. An attempt to find similar personality 
trends in the response patterns of the student 
nurse to the Kuder Preference Record and to 
the MMPI was not successful. A previous 
study (1) furnished data for the compari- 
son. In one respect only were the findings 
comparable. The two items in the MMPI 
receiving the highest ¢ value referred to a 
liking for science. Similarly in the Kuder, 
the two “most” choices of the nurses having 
the highest rating referred to interest in being 
a chemist or giving lectures in chemistry. 
Received June 2, 1953. 
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Screening devices for choosing candidates 
for a given profession or occupation have in 
the main centered in the cognitive rather than 
the affective areas. Measurement of aptitudes 
has achieved a degree of objectivity and va- 
lidity that is sti!l somewhat rare in the less 
tangible areas of personality and motivation. 
Yet the importance of the latter factors has 
not gone unnoticed. In the field of nursing 
subjective evaluations have pointed to cer- 
tain intrinsic personality qualifications as es- 
sential to success. Disciplined efficiency un- 
der emergency conditions and the ability to 
give comfort and reassurance to the patient 
are among the demands made upon the nurse. 
More specific identification of essential traits 
and a means of measuring them is a goal not 
yet achieved. 

The belief that nurses as a group do repre- 
sent a more stable segment of the population 
than the average has had popular acceptance 
for some time. Studies bulwarking this be- 
lief have not been lacking. In 1927 Elwood 
(1) studied two groups, one made up of 
nurses, the other college girls. Using Laird’s 
Introvert-Extrovert Scale and Woodworth’s 
Emotional Inventory, he concluded that both 
tests placed the nurse in a more favorable 
light. Lough (3), using the responses on the 
Minnesota Multiphasic Personality Inven- 
tory as a basis for her analysis, compares 
nursing students with women students en- 
rolled in liberal arts and education curricula. 
She reported the nurses as being more stable 
than the other groups and as having more 
masculine interests. In a subsequent study 


Lough (4) substantiates her findings through 
a Statistical validation of the differences found 
between the cadet nurses and the students of 
General Curriculum. Healy and Borg (2), 
using the Guilford-Martin battery of person- 
ality tests measuring thirteen putative fac- — 
tors, compared a group of nursing-school 
freshmen from six schools of nursing with 
students at the University of Texas. They 
found no characteristic pattern in the analy- 
sis of scores of the beginning nurses. They 
state that this is to be expected to some ex- 
tent since the students are not screened in 
most of the schools and the data were col- 
lected prior to the withdrawal of students 
not fitted to the program. 


The Present Study 


The writer’s study investigating the per- 
sonality attributes of student nurses is also a 
comparison between student nurses and a 
group of college women students majoring in 
education curricula. A total of 86 women 
students enrolled in the Knapp College of 
Nursing at Santa Barbara, California, made 
up the experimental group. These were 
matched for race, sex, age and percentile on 
the ACE, with an equal number of education 
majors at the University of California, Santa 
Barbara College. There was one Japanese in 
each group, the remainder being Caucasian. 
The age range for both groups was 17 to 25 
years. Means for the groups for age and for 
ACE percentiles are presented in Table 1. 
Students were individually matched for these 
variables. In the majority of cases age was 


Table 1 
Matching Variables, Experimental and Control Groups 








Age 


ACE Percentile 
Mean SD 


Mean SD 





Experimental Group 
Control Group 


18.7 1.5 
18.7 a7 


34.4 22.6 
35.1 22:8 
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also held constant or varied by not more than 
two years. Variation in ACE percentile points 
was not greater than three points except for 
a very small number of cases. 

The group MMPI was administered to 
all students prior to entering college. The 
score sheets were then analyzed to determine 
whether a group of questions could be identi- 
fied which would differentiate one group of 
students from the other. A total of 66 items 
were singled out, the critical ratio of the per- 
centage difference being 2.00 or greater for 
each of the 66 items (23 of these items were 
significant at the .01 level). These items 
were broken down into four named categories 
and one miscellaneous group by the investi- 
gator. These categories, which presumably 
identify personality characteristics of the stu- 
dent nurse as contrasted with those of educa- 
tion majors, are presented in Table 2. Also 
given are the item number of the Group 
MMPI, the ¢ value of the difference, and the 
answer characteristic of the nursing student. 

The first category of ten items, labeled “A 
Social-Sexual Factor,” is characterized by a 
preference for the mannish and for masculine 
activities. The student nurse admits a pref- 
erence for association with her own sex; she 
likes the tall mannish woman. There is noth- 
ing of the feminine coquette in her make-up 
nor does she find pleasure in social dancing. 
Soldiering, reporting sports news, forestry 
work, all have their appeal for her. 

The second group includes twelve items 
and seems to delineate a conventional adher- 
ence to custom and a prudish, decorous atti- 
tude usually absent in today’s coed. She is 
embarrassed by dirty stories. She avoids 
sexy shows; she is not in on the gossip of the 
group. She disapproves of women smoking 
and imbibing alcoholic beverages. She does 
not believe that men are absorbed with the 
subject of sex. A person should be punished 
for breaking the law, even if it is unreason- 
able. Duty to a life goal inspires and moti- 
vates her. 

Minimal psychosomatic concern is the label 
given to the third group consisting of eleven 
items. Two of these items have high ¢ values. 
The first one, “The sight of blood neither 
frightens me nor makes me sick,” with a ¢ 
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value of 4.33, ranks highest in the total list 
with the exception of two items in the miscel- 
laneous group which refer to a liking for sci- 
ence. A ¢ value of 3.83 is obtained for the 
other item, “I sweat very easily even on cool 
days.” The student nurse does not worry 
about her health, does not dread seeing a doc- 
tor, can’t remember “playing sick,” does not 
feel tired. Symptoms of hypochondria are 
lacking. 

The responses by the student nurse to the 
items making up the fourth category, labeled 
“Freedom from Neuroticism,” are indicative 
of emotional stability. This category con- 
tains the largest number of items, twenty in 
all. The difference is significant at the .01 
level of confidence for seven items. The stu- 
dent nurse denies fear of people, the dark, 
and high places. Anxiety and tension are not 
part of her everyday experience. Home life 
is pleasant and few quarrels with members of 
her family are admitted. She makes no claim 
to personal importance, but states that she 
expects to succeed in the activities which she 
attempts. 

The miscellaneous category, including twelve 
items, presents some difficulties of interpreta- 
tion. Responses to some items, as the first 
two, referring to a liking for science and sci- 
ence reading, are easily understood on a com- 
mon sense basis. As previously stated these 
two items have the highest ¢t value of the en- 
tire list, being respectively 8.17 and 7.66. It 
seems fairly obvious that candidates for nurs- 
ing training would check such items as true. 
Another group of responses seems almost para- 
doxical in the light of previous interpreta- 
tions. Items 102, 461, and 417, all answered 
as true by the nurse, might be interpreted as 
contradictory of claims made for freedom 
from neuroticism. The writer has no ration- 
alization to offer. These items refer respec- 
tively to the hardest battles being with one- 
self, difficulty in setting aside a task once 
begun, and annoyance with anyone who tries 
to get ahead in line. Three items, 89, 199 
and 261, of rather low discrimination, seem 
to have little significance for this study as far 
as interpretation is concerned. Three other 
items, all answered false, have reference to 
occupational preference and membership in 
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Table 2 


The Significance of the Difference Between the Responses of 86 Student Nurses and 86 College Women 
in Education Curricula to 66 Categorized Items on the Group Form of the 
Minnesota Multiphasic Personality Inventory 


Marking 
MMPI t Characteristic 
Item Value of Student 
of Diff. 


Number 


Nie-ses 


Categories and MMPI Questions 


A Social-Sexual Factor Characterized by a Preference for the 
Mannish and Masculine Activities 
3.00 True I like mannish women. 
2.60 True I am very strongly attracted by members of my own sex. 
2.00 True Usually I would prefer to work with women. 
2.00 False I love to go to dances. 
2.14 True I like tall women. 
2.00 True - I would like to be a soldier. 
2.00 True If I were a reporter I would very much like to report sporting news. 
2.66 True I very much like horseback riding. 
2.14 True I think I would like the kind of work that a forest ranger does. 
False I like to flirt. 
A Conventional Attitude 
True I am embarrassed by dirty stories. 
True I am quite often not in on the talk and gossip of the group I belong to. 
True I do not like to see women smoke. 
True I believe that a person should never taste an alcoholic drink. 
True I have been inspired to a program of life based on duty which I 
have since carefully followed. 
True I never attend a sexy show if I can avoid it. 
True I am against giving money to beggars. 
False When a man is with a woman he is usually thinking about things 
related to her sex. 
False If I could get into a movie without paying and be sure I was not 
seen I would probably do it. 
False A person shouldn’t be punished for breaking a law that is un- 
reasonable. 
True I am easily downed in an argument. 
False I like to go to parties and other affairs where there is lots of loud fun. 
Minimal Psychosomatic Concern 
True The sight of blood neither frightens me nor makes me sick. 
True I have never had a fainting spell. 
True I do not have spells of fever or asthma. 
True I am almost never bothered by pains over the chest or in the heart. 
True I seldom worry about my health. 
True T do not dread seeing a doctor about a sickness or injury. 
False I sweat very easily even on cool days. 
False I can remember “playing sick” to get out of doing something. 
False I am a high-strung person. 
False I feel tired a good deal of the time. 
False I am neither gaining nor losing weight. 
Freedom from Neuroticism 
True I believe that my home life is as pleasant as that of most people 
I know. 
True I have very few quarrels with members of my family. 
False I find it hard to keep my mind on a task or job. 
False I am an important person. 
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Table 2—Continued 








Marking 
Characteristic 
of Student 


MMPI t 
Item Value 
Number of Diff. 
165 3.14 
163 4.00 
370 3.00 
352 2.20 
388 2.20 
166 2.00 
2.00 
2.00 
2.66 


Categories and MMPI Questions 





I have several times had a change of heart about my life work. 

I do not tire quickly. 

I hate to have to rush when working. 

I have been afraid of people or things that I knew could not hurt me. 

I am afraid, tozbeZalone_in the dark. 

I am afraid when I look down from a high place. 

I have more trouble concentrating than other people seem to have. 

I am often afraid of the dark. 

I am anxious and upset when I have to make a short trip away 
from home. 

2.00 

2.20 

2.20 

2.00 


Several times I have been the last to give up trying to do a thing. 

I work under a great deal of tension 

I often feel as if things were not real. 

I do not blame a person for taking advantage of someone who lays 
himself open to it. 

2.40 

2.20 


I usually expect to succeed in things I do. 

I have at times had to be rough with people who were rude or 
annoying. 

I prefer work which requires close attention to work which allows 
me to be careless. 


2.43 


Miscellaneous 
8.17 
7.66 
2.29 
3.29 


I like science. 

I like to read about science. 

My hardest battles are with myself. 

I find it hard to set aside a task that I have undertaken, even for a 
short time. 

3.20 I am often so annoyed when someone tries to get ahead of me in a 

_ line of people that I speak to him about it. 

It takes a lot of argument to convince most people of the truth. 

2.00 Children should be taught all the main facts of sex. 

2.00 If I were an artist I would like to draw flowers. 

3.83 “als I would like to be a journalist. 

2.29 

3.40 


2.00 


I like to read newspaper editorials. 

The only miracles I know of are simply tricks that people play on 
one another. 

2.43 I should like to belong to several clubs or lodges. 


clubs. Similar items answered as true seemed 
to fit into the Social-Sexual category. Is it 
because the latter had a definitely masculine 
connotation for the student nurse or is some 
other elusive factor operative? Item 387, 
which refers to miracles as simply tricks 
played by people on one another, as answered 
by the student nurse may have reference to 
some sort of youthful idealism which she en- 
tertains regarding possible miracles performed 
by the members of the medical profession, 
Divine aid perhaps being assumed. The ¢ 


value of 3.29 signifies high validity for this 
item. 

As a means of validation, the 66 items were 
mimeographed on a single sheet and the stu- 
dents were asked to recheck the list. This 
presents a slightly different situation from 
that obtaining in the first instance when the 
66 items were scattered through the total ag- 
gregate of MMPI items. Furthermore, the 
cadets were now established in their training 
program whereas previously they were not 
certain of admission to the nursing school. 
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Mean scores were computed for both the 
nursing and the college groups and compared 
with the original averages for the 66 items. 
In each instance, the average number of plus 
answers was lower than in the original tests, 
but the difference between the averages of the 
groups remained approximately the same. A 
further check was attempted when two groups 
of nursing students from the Bishop Johnson 
College of Nursing and the Hollywood Pres- 
byterian Hospital School of Nursing, both in 
Los Angeles, California, were asked to check 
the 66 items. The mean score of the latter 
group, from Hollywood Presbyterian School, 
was very close to that of the original mean 
score and higher than the mean retest score 
of the Knapp group. In the Bishop Johnson 
College group, the mean score was very close 
to that of the Knapp retest score. When the 
significance of the difference between the 
mean scores of the nurses and the college 
women was tested, CR’s were found to vary 
from 5.15 to 11.30. The lowest CR was ob- 
tained from a comparison of the mean scores 
of Santa Barbara students and Bishop John- 
son School nurses; the largest CR, 11.30, was 
obtained from a comparison of the mean 
scores of Santa Barbara students and the 
Knapp College students on the original test. 
Additional comparisons yielded scores inter- 
mediate between these extremes. These data 
are given in Table 3. The Pearson product 
reliability of the 66 item test as measured by 
the split-half, odd-even technique was .64. 
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Summary 


1. An investigation into the personality at- 
tributes of student nurses as compared with 
education majors utilized the responses of the 
group MMPI as the basis for study. A total 
of 86 women students enrolled in the Knapp 
College of Nursing at Santa Barbara were 
matched for race, age and ACE aptitude per- 
centile with an equal number of education 
majors at the University of California, Santa 
Barbara College. 

2. From the total number of MMPI re- 
sponses 66 items were singled out to make up 
a scale which differentiated one group from 
the other. The criterion of selection was a ¢ 
of 2.00 or greater. Of the 66 items, 23 were 
found to be significant at the .01 level of con- 
fidence. The odd-even, split-half technique 
yielded an r of .64 as the reliability of the 
66 point test. 

3. The 66 point test gave mean scores which 
differentiated the two groups to approxi- 
mately the same degree as did the original 
tests, although the absolute scores were lower 
in each case. The CR of the mean difference 
between the two groups in the original test 
was 11.30. When the students were retested 
with the 66 item test, the CR was 9.98. The 
difference in mean scores between the two 
groups in the original test was 9.38 and in the 
retest 9.68. Two other groups of student 


‘nurses in Los Angeles schools who were tested 


on the 66 item test yielded average scores 
similar to the experimental groups. The CR’s 
of the mean difference for these respective 


Table 3 


Mean Scores for Sixty-six Items, Original and Retest Data 


Student Groups 





N 


Original Tests 


Mean 


Sigma SEm 





. Knapp College of Nursing* 

. Santa Barbara College* 

. Bishop Johnson School of Nursing 

. Hollywood Presbyterian School of Nursing 


5. Knapp College of Nursing 
. Santa Barbara College 





86 
86 
50 
57 


54 
36 


* Sixty-six items embodied in“total matrix of MMPI. 


A8 
.67 
67 
73 


41.6 
32.2 
37.1 4.7 
40.0 5.5 
Retest 


4.5 
6.2 


. 62 
4.4 .75 


29.1 
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groups when compared with the Santa Bar- 
bara College group were 5.15 and 7.95. 

4. The 66 items were broken down into 
four categories presumably identifying per- 
sonality attributes of the student nurse. 
These were labeled a Social-Sexual Factor, a 
Conventional Attitude, Minimal Psychoso- 
matic Concern, and Freedom from Neuroti- 
cism. An additional group of 12 items made 
up a miscellaneous category. 

5. The study offers evidence that the stu- 
dent nurse presents a significantly different 
pattern of response for a small number of 
selected items on the group MMPI when 
compared with a group of college education 
majors. Presumptive evidence is furnished 
that the student nurse is a more stable in- 
dividual who exhibits a preference for her 
own sex and likes mannish qualities in her 
associates. She is fastidious and conven- 
tional in her attitude and is duty inspired. 
Symptoms of hypochondria are lacking as is 
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evidence of neuroticism. These findings, in 
general, corroborate the findings of Elwood 
and Lough. Though both Lough and the 
writer used the group MMPI as the basis for 
analysis, the approach is somewhat different. 
Lough utilized the mean scores from the 
MMPI profile while the writer used the in- 
dividual item responses. 


Received January 9, 1953. 
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Item Validity of the Lee-Thorpe Occupational Interest Inventory 


Leopold Bridge and Meyer Morson 
Baltimore Regional Office, Veterans Administration ' 


During the use of the Occupational Interest 
Inventory, Advanced for the identification of 
specific interest areas it was noticed that some 
of the test items did not appear to relate 
closely to the specific areas for which they 
were scored. This apparent discrepancy led 
the authors of the present article to explore 
the general concepts of validity underlying the 
construction of the Lee-Thorpe Inventory to 
determine whether their observations had any 
bearing on the usefulness of, the test. 

A search of the available published ma- 
terial showed little relating to the validity of 
the Occupational Interest Inventory. Super 
wrote in 1949 (6) that he had located no 
studies of its validity; and no validity studies 
are reported in the Third Mental Measure- 
ments Yearbook (3). A study by McPhail 
(5) described the establishment of inter- 
est profiles for various occupational groups 
through the use of the Inventory but did not 
examine the validity of the test as a measur- 
ing instrument. 

No validity data are presented in the 
Manual of Directions (4) for the Inventory 
but the authors state that the observation of 
the following criteria has contributed to the 
validity of the tests: (1) the selection of the 
items; (2) the design or description of the 
items; (3) the balance of the items consti- 
tuting the Inventory; and (4) the presenta- 
tion of the items. 

The importance of these criteria is apparent 
from an inspection of the test. This study 
concerns only Part I which consists of 120 
pairs of items each member of a pair being 
identified by the authors with one of six ma- 
jor occupational fields designated by a letter 

1 This work was not performed in connection with 
Veterans Administration activities and does not in- 
volve VA records. The opinions expressed are those 
of the authors and not necessarily those of the Vet- 
erans Administration. The authors wish to express 
their sincere appreciation for the cooperation of Miss 
Banos of the Maryland Employment Service, Dr. 
Sprol of the Veterans Administration, and Dr. Ter- 


williger of the Maryland State Vocational Rehabilita- 
tion Division, and their respective staffs. 


and a descriptive phrase: (A) Personal-Social 
(P-S.); (B) Natural (Nat.); (C) Mechani- 
cal (Mech.); (D) Business (Bus.); (E) The 
Arts (Ar.); and (F) The Sciences (Sci.). 

Each major field contains 40 items each 
presumably descriptive of the respective field. 
It is, obviously, essential that extreme care 
be exercised in the selection of each test item 
when a total of 120 responses will result in de- 
termining the order of preference among six 
areas of interest. 

Unless an activity is properly designated, 
preference for it will distort two scores——the 
field for which it is scored and the field to 
which it really belongs. In the event that the 
items are not properly representative of the 
occupational fields for which they are scored 
it would be expected that the testees would 
express disagreement with their inventoried 
interest patterns or with the relative rankings 
of their inventoried interests. 

In two studies Brown (1, 2) compared the 
expressed and inventoried interests of veter- 
ans as measured by the Lee-Thorpe Occupa- 
tional Interest Inventory and found that there 
were significant differences. Although Brown 
did not discuss the question it is possible that 
some of the discrepancy was due to the fact 
that items in the test were improperly desig- 
nated as to the occupational field. 


Method and Procedure 


In order to evaluate the hypothesis that 
some of the Lee-Thorpe items were not prop- 
erly designated it was decided to analyze each 
item to see if it was assigned to the proper 
occupational field. It was felt that the per- 
sons best qualified to make such a determina- 
tion would be those who had considerable ex- 
perience in occupations either from the point 
of view of counseling or occupational analysis. 
The items were therefore reviewed by a group 
of 38 raters, including 18 Counselors or Oc- 
cupational Analysts from the Maryland State 
Employment Service with an average experi- 
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Item Validity of Lee-Thorpe Occupational Interest Inventory 


Table 1 


Field of 
Interest 


No. of 
Items 


Items 
Questioned 
P.-S. 40 14 
Nat. 40 26 
Mech. 40 30 
Bus. 40 12 
Ar. 40 23 
Sci. 40 28 


All Fields 240 133 


ence of 12 years, 15 Vocational Counselors 
from the Maryland State Rehabilitation Di- 
vision whose experience averaged 41% years, 
and 5 Vocational Advisers from the Veterans 
Administration with an average counseling ex- 
perience of 6 years. 

Each person was asked to assign each of the 
240 test items to the Field of Interest in 
which he felt that it belonged. The possi- 
bility of being influenced by the prior desig- 
nations of the test constructors was eliminated 
by securely masking the letter designations of 
the items printed in Part I of the Inventory. 

The responses of the 38 qualified raters 
were scored in terms of dissents, i.e., assign- 
ment of an item to a Field other than that 
designated by the authors. The few instances 
where raters felt that an item did not fit into 
any of the six fields of interest were also 
scored as dissents. On this basis the dissent 
score for each of the 240 items could have any 
value from 0 to 38. A score of O indicates 
that all the raters assigned the item to the 
same field as the authors of the test while a 
score of 38 indicated that none of the raters 
felt that the item belonged in the field to 
which it had originally been assigned. 


Results 


After tabulating the dissent scores by item 
and by occupational field a wide scattering 
of scores was apparent. Opinion ranged from 


1. No Dissents 


3. Arts; 


Analysis of Dissent Scores for Major Occupational Fields 


wr 
Items 
Not 
Questioned 


Mean 
Dissents 
Per Item 


No. of 
Dissents 
26 165 
14 341 
10 596 
28 76 
17 305 
12 346 


107 1829 


total agreement to total disagreement with a 
general tendency to cluster at the extremes. 
It was found that some degree of dissent was 
found for 133 of the 240 items. The Fields 
which had the least number of questioned 
items were the Business and Personal-Social 
while the greatest number of challenged items 
were in the Mechanical and Scientific Fields. 

In view of the fact that dissent scores 
ranged from O to 38, it was felt that the 
intensity of the dissent per item should also 
be considered in evaluating the soundness of 
the various fields of interest. 

Table 1 indicates that the Business Field 
was considered as the soundest area in the 
test. In addition to having the fewest items 
questioned, it also has the lowest total num- 
ber of dissents. On this same basis the six 
fields of the Occupational Interest Inventory 
may be ranked from most valid to least valid 
as follows: 1. Business; 2. Personal-Social; 
4. Natural; 5. Scientific; and 6. 
Mechanical. This order expresses the rela- 
tive degree to which the items in the test 
were considered to actually correspond to the 
field for which they are scored and which they 
purport to measure. 

Further analysis of the dissent scores can 
be used for an evaluaticn of the individual 
item validities. On the basis of these scores 
the test items can be divided into three major 
categories: 


Raters place item in the same occupational field as 


the authors. 

2. Low Dissent Score ............... Some disagreement as to the proper placement of 
the item but too slight to justify positive assertion 
that item is improperly placed. 





382 


3. Significant Dissent 


Leopold Bridge and Meyer Morson 


Raters believe that item does not belong in field 


assigned by authors. 


A. High Agreement among raters 


..When there is a high degree of agreement among 


the dissenting raters the item may be considered as 
sound but belonging in a field other than that as- 
signed by the authors. 


B. Disagreement among raters .. 


.. Item is not sound because there is no general agree- 


ment as to the field to which it should be assigned. 


As has been previously stated there are 107 
items (out of 240) concerning which the raters 
are in complete accord with the authors of the 
test. Further, if we assume that a dissent score 
of 12 or less (at least 66% agreement with the 
authors) indicates a satisfactorily classified 
item, we find that 76 additional items or a 
total of 183 should be considered as assigned 
to the proper occupational field. In other 
words, the raters agree that 76% of the ques- 
tions in Part I of the Occupational Inventory 
meet the necessary criteria of validity for in- 
clusion in the test. 

There are 24 items with dissent scores of 
26 or more which indicates substantial dis- 
agreement as to the authors’ classificaiion of 
the items but where, in addition, there are 
significant agreements among the raters as to 
the proper placements of the items. Of these 
items 11 are now classified in the Mechanical 
Field, 4 each in Arts and Sciences, 3 in Natu- 
ral and 2 in Personal-Social. As a result of 
the raters’ revisions, 5 should be in the Per- 
sonal-Social Field, 3 in the Natural, 3 in the 
Mechanical, 3 in the Business, 4 in the Arts, 
and 6 in the Sciences. The following exam- 
ples are characteristic of the items that the 
raters scored as being in improper categories: 


Occupational Field 
Assigned by 
Authors Raters 


Nat. Sci. 


Item as Shown in Inventory 


“Plan experiments to con- 
trol worms, insects and 
other pests.” 

“Teach people how to im- 
prove their manners and 
poise.” 

“Clean and recharge storage 
batteries.” 

Art. “Paint signs on windows or 
do lettering on posters 
with brush or pen.” 

Sci. “Experiment with the mak- 

ing of synthetic products, 

such as artificial teeth, 
nylon or cellophane.” 


P.-S. 


Mech. 


Up to this point the discussion has ac- 
counted for 207 items. The balance consists 
of the 33 controversial items where no sub- 
stantial number of the raters could agree as 
to the proper occupational designation. The 
following items are characteristic of this group 
(the numbers in parentheses indicate the num- 
ber of raters preferring the occupational field 
designated) : 


Occupational Field 
Assigned by 
Authors Raters 
P.-S. P.-S. 

Bus. 


Jiem as Shown in 
Inventory 
“Take care of the cor- 

respondence and pri- 
vate affairs of an- 
other person.” 
“Direct the quick-freez- 
ing or dehydration of 
farm products.” 


(23) 
(15) 


Nat. 
Sci. ( 9) 
Bus. (19) 
Mech. ( 1) 
Mech. ( 5) 
Bus. (22) 
Ps. ¢ $) 
Sci. (1) 
Nat. ( 5) 
Art. ( 3) 
Peo, (17) 
Nat. (23) 
Sci. (14) 
P.-S. (20) 
Mech. ( 4) 


( 9) 


“Label bottles, sort and 
wrap fruit, or pack 
eggs.” 


“Mow lawns, clip 
hedges and_ bushes, 
trim trees.” 

“Keep a doctor’s tools 
and equipment in 
order.” 


A careful inspection of these items will in- 
dicate the basis on which the raters based 
their opinion. While there is some defense 
for each choice there is no way to show 
that the item will not be regarded in various 
ways by persons taking the test and therefore 
cannot be said to meet the criteria of validity 
as set forth in the Manual. 

A study of the results of this survey com- 
pares in an interesting fashion with the study 
made by Brown with 60 veterans which in- 
volved a comparison of expressed and inven- 
toried interests as shown by the Lee-Thorpe 
Occupational Interest Inventory (2). One of 
the conclusions reached was that 74.4 per cent 
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of the veterans felt that their expressed inter- 
ests corresponded with the relative ranking 
of their interests as shown by the test. He 
found that the greatest dissent was found in 
connection with scores in the Mechanical 
Field with a bias toward the belief that the 
scores were too low. A review of the test 
items by a group of raters experienced in the 
fields of vocational counseling, placement and 
job analysis reveals that 11 of the 40 items 
now classified as “Mechanical” actually be- 
long in other fields but that only three items 
otherwise classified should be considered as 
“Mechanical.” This leaves the test with only 
32 items in this field so that in many in- 
stances mechanical interests may be inade- 
quately measured and may account for the 
expressed dissatisfaction as found by Brown. 
The results of the present study indicate 
that room exists for further detailed work on 
the selection of items for inclusion in the In- 
ventory. The problem of selecting items that 
can be considered as belonging exclusively to 
one field is difficult but not impossible as 
shown by the number of items on which the 
raters were in complete accord with the 
authors. It is essential, however, to see that 
the items do not contain activities or elements 
that belong to more than one field. It would 
be even more important to demonstrate that 
the items also do, in fact, discriminate be- 
tween occupational groups a la Strong. 


Summary and Conclusions 


1. The 240 items of Part I of the Lee- 
Thorpe Occupational Interest Inventory, Ad- 
vanced were analyzed in terms of agreement 
of raters with the occt ‘tional fields for 
which they are scored. 
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2. The analysis was performed by 38 raters 
with extensive backgrounds in vocational 
counseling and occupational analysis. 

3. Scoring was done in terms of dissents, 
ie., disagreement with the authors’ classifica- 
tion of the items. 

4. Raters were in complete accord with the 
authors of the Inventory on 107 items and in 
substantial agreement on 76 more items. 

5. Raters felt that 24 items were in occu- 
pational fields other than those in which they 
are now assigned, the Mechanical Field being 
least reliable with 11 out of 40 items con- 
sidered to be improperly classified. 

6. No substantial agreement was reached as 
to the proper occupational classification of 33 
more items. 

7. Since the validity of 57 of the 240 items 
is questionable, caution should be used in the 
interpretation of the interest pattern obtained 
through use of the Inventory. 


Received December 10, 1952. 
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In each of the polls of the Purdue Opinion 
Panel a brief scale is included to measure the 
socio-economic status of the respondents in 
a nationally representative sample of high 
school youth numbering from 8,000 to 18,000. 
The purposes, scope and details of the opera- 
tion of Panel have been described elsewhere 
(1, 2, 5, 6). The items in this brief scale 
were originally taken from The American 
Home Scale as the items with highest validity 
(4). They have been slightly revised on oc- 
casion because of obsolescence of an item. 
For example, an item asking whether any 
member of the family had been on relief was 
valid in 1940, but obviously not in 1953. The 
items used in the present study are as follows. 

House and Home: Answer these ques- 
tions by checking “yes” or “no” in the space 
below. 

“Does your family have: 

. a vacuum cleaner? 

. an electric or gas 
refrigerator? 

. a bathtub or a shower 
with running water? 

. a telephone? 

. an automobile? 

*. Have you had paid 
lessons in dancing, 
dramatics, expression, 
elocution, art, or music 
outside of school? 


Yes 
Yes 


Yes 
Yes 
Yes 


Yes 


The Problem 


The problems of the present study were: 
(1) testing the unidimensionality of these 
items by means of the Guttman test of scala- 
bility (3); and (2) testing the validity of the 
scale. 


Procedure 


Two independent random samples of 100 
respondents’ records were drawn from a total 
available sample of about 10,000 by taking 
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every nth individual record. Each sample of 
100 was then tested by means of the Guttman 
technique (3), once by restricting cut-off 
points to score boundaries, and again by the 
“ideal” method. 

All pupils in the two samples reported at 
least one of the items. Possession of an auto- 
mobile did not, however, distinguish anywhere 
along the line, thus showing it to be a non- 
discriminating item. Quite possibly the make, 
year and model of the automobile might be 
found to be relevant to such an index of socio- 
economic status, but this information was not 
in hand. The other items scale satisfactorily 
by Guttman’s criterion of 90 per cent repro- 
ducibility. 


Validity of the Scale 


Validity may be variously defined but per- 
haps its most acceptable meaning is in terms 
of the prediction of a criterion. It is essen- 
tially in this sense that we use the term here. 
The data are taken from Purdue Opinion 
Panel Poll Reports in the form of statistically 
reliable differences between the low and the 
high status groups as defined by the scale. 
All differences reported here are at the 1 
per cent level of confidence or better. The 
critical ratios range from 2.6 to 9.5. The 
stratified-random sample is usually between 
2,000 and 3,900 respondents with from 20 to 
25 per cent in the high socio-economic status 
group and from 75 to 80 per cent in the low 
group. 

In Poll No. 21 in 1949 ' the students in the 
national sample were asked to check their in- 
dividual problems in a list of 300 such prob- 
lem items. The following items in Table I 
summarize the breakdown of responses which 
yielded reliable differences, i.e., reliable cor- 


1 Results of this poll have been published as the 
SRA Youth Inventory by Science Research Asso- 
ciates, 57 West Grand Avenue, Chicago 10, Illinois. 
The Manual gives the technical data on reliability, 
item-test correlations, norms, etc. 





Scalability and Validity of Socio-economic Status Items 


Table 1 


Percentage Differences between Low and High Socio-economic Status Groups on Problems 
that Predominate in the Lower Group 


Note: all differences reliable at the 1% level of confidence or better. 


Item 








. I would like to have more vocational! courses. 

. What shall I do after high school? 

. Should I go to college? 

. I can’t afford college. 

. I must select a vocation that doesn’t require 
college. 

. What jobs are open to high school graduates? 

. How do I go about finding a job? 

. I feel that I’m not as smart as other people. 

. I get stage fright when I speak before a group. 

. I wish I could carry on a pleasant conversation. 

. I want to learn to dance. 

. There aren’t enough places for wholesome rec- 
reation where I live. 

55. I can’t find a part-time job to earn spending 

money. : 

. I have no quiet place where I can study at 
home. 

. I can’t get along with my brothers and sisters. 

. I wish I had my own room. 


. My teeth need attention. 


relations between socio-economic status as 
measured and the incidence of problems 
checked by the low and the high socio-eco- 
nomic status groups. The items are the 39 
that yielded such differences. 

Beyond demonstrating validity of the status 
scale, the results summarized in Table 1 also 
give something of a qualitative picture of the 
kinds of preoccupations, worries and prob- 
lems which characterize the low status group. 
By way of contrast it is of interest to examine 
the items that yield significantly higher pro- 
portions of responses from the high status 
group. They are shown in Table 2. 

It is of interest to note that, if we take the 
averages of Low and High groups in Tables 1 
and 2 as indices of the amount of worrying 
among teen-agers in the country as a whole, 
it is evident that the two groups worry about 
the same amount. They differ sharply, how- 
ever, in the Ainds of worries that they have. 


Socio-economic 
Status 
Low High 
N= 1809 N = 646 
31 - 6 
51 k 14 
36 10 
24 15 


Difference 


16 10 
45 : 20 
37 i 7 
36 9 
55 : 10 
36 8 
35 - 10 


46 d 12 


30 


Summary and Conclusions 


The Guttman test of scalability was ap- 
plied to two independent random samples 
drawn from a total sample of approximately 
10,000 high school pupils’ responses. Validity 
of the scale was investigated in terms of sig- 
nificant differences between items in the SRA 
Youth Inventory and the socio-economic 
status index used in the Purdue Opinion 
Panel. The data support the following con- 
clusions. 

1. The items of the Socio-economic Status 
Index are scalable and represent substantially 
a unidimensional scale. 

2. The scale is valid in that it correlates 
significantly with individual problems re- 
ported by a national sample of high school 
pupils. 

3. The two groups into which the respond- 
ents are divided have about the same amount 
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Table 2 


* Percentage Differences between Low and High Socio-economic Status Groups on Problems 
that Predominate in the Upper Group 
Note: all differences reliable at the 1% level of confidence or better. 








Socio-economic 
Status 
Item Low High 
% N=1809 N=646 Difference 





. I would like to take courses that are not 
offered in my school. 31 
. [ have too much homework. 19 
. For what work am I best suited? 54 
. How much ability do I actually have? 58 
. I want to know more about what people do 
in college. 35 
. How shall I select a college? 34 
. I want to make new friends. 49 
. I have a desire to feel important to society or 
to my own group. 19 
. I'd like to know how to become a leader in 
my group. 21 
. L have difficulty budgeting my time. 
. I want to be accepted as a responsible person 
by my parents. 18 
. I don’t get enough sleep. 14 
. How can I help get rid of intolerance? 12 
. How can I help to make the world a better 
place in which to live? 27 
. What can I do about the injustice all around 
us? 13 
. I’m worried about the next war. 29 
. Is there something I can do about race preju- 
dice? 21 
. Is there any way of eliminating slums? 21 
. What can I do to help get better government? 13 
5. How can I learn to use my leisure time wisely ? 23 
. What can I contribute to civilization ? 10 
. I wonder about the after life. 20 





of worries, but these are qualitatively very 3. ——- L. cee apyg aaron, oF od scale 
: and intensity analysis. Educ. psychol. Measmt., 

different for the two groups. 1947, 7, 247-279. 
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In any group there is variation in the extent 
of participation by different members in the 
activities of the group with some members 
more active than others. Among other fac- 
tors, it is likely that differences in the per- 
sonal characteristics of the members will help 
to explain the variation in extent of par- 
ticipation. 

Many voluntary organizations have both a 
social and an organizational aspect with two 
corresponding areas in which group members 
may participate. Participation in the more 
purely social activities of the group may be 
quite different from working to recruit mem- 
bers, planning and arranging meetings, etc. 
This study is concerned with the latter type of 
participation. 

The purpose of this study is to explore two 
general hypotheses concerning the differences 
between active members (AM) and passive 
members (PM): 

1. AM as compared with PM are more 
motivated to participate in organizational ac- 
tivities, become more involved in the organiza- 
tion and derive more satisfaction from the 
organization. These differences are due, in 
part, to the possession to a greater degree by 
the AM group of personality characteristics 
which dispose them to become interested in 
and participate in groups. 

2. AM more than PM have abilities which 
are probably related to effective action in or- 
ganizations. In this study, it is assumed that 
the skills required to perform organizational 
functions are largely verbal and are ade- 


1 This report is one of a series of research studies 
in student life being conducted by the Office of the 
Dean of Students, University of Minnesota. Various 
staff members gave helpful advice and assistance with 
this study. The author is grateful to Jack Laugon, 
St. Olaf College, formerly with the Student Activi- 
ties Bureau, Office of the Dean of Students, Univer- 
sity of Minnesota, for his assistance in a pilot phase 
of this investigation. 

This study was supported in large part by a re- 
search grant from the Graduate School of the Uni- 
versity of Minnesota. 
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quately measured by an academic aptitude 
test. 


Method 


A questionnaire was given to 19 of the 20 
academic sororities on the campus of the Uni- 
versity of Minnesota. These questionnaires 
were completed by approximately 90% of the 
entire sorority membership. Two questions 
were used to select the subjects. The ques- 
tion used to select the AM of the sorority was, 
“List the names of the members of your so- 
rority who would be a real loss to the sorority 
if they became inactive.”” The question used 
to select the PM was, “List the names of the 
members of your sorority who do not seem to 
have much interest in the sorority.” To be 
included in the sample as an AM, a girl had 
to be selected by at least one-third of the re- 
sponding members of her sorority. The cor- 
responding criterion for PM was 10% or 
more. 

In order to control two variables which were 
correlates of active and passive participation 
but not relevant to this study, some members 
of the sample were eliminated: (1) Girls liv- 
ing in the house were excluded because they 
were more frequent in the AM group and not 
frequent enough in the PM group to warrant 
separate analysis. The sample, therefore, con- 
sists entirely of town girls who generally com- 
mute from their homes to the University. 
(2) Girls who were members less than one 
year were excluded to eliminate the effects of 
a short membership period. This procedure 
left 41 AM and 37 PM. Almest all of the AM 
held an important sorority office. Few of the 
PM held an office. This outcome suggests 
that our dimension of active and passive mem- 
bership is similar to but not identical with 
the leadership-followership dimension. The 
followership classification does not exclude 
active members who are not leaders nor does 
the leadership classification necessarily in- 
clude active members who contribute to the 
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group but not as leaders. Bird (1) and Stog- 
dill (5) have excellent summaries of differ- 
ences between leaders and non-leaders. 

The questionnaire contained items related 
to the member’s satisfactions and dissatisfac- 
tions with her sorority. Five-point rating 
scales elicited self-estimates of importance to 
the group, participation in the group, feelings 
of group belongingness, satisfaction with and 
acceptance of group decisions. Friendship 
choices and extra-sorority activities, as well 
as some background and demographical data 
were also obtained. 

For many of the girls, test scores were on 
file at the Student Counseling Bureau. The 
tests included the Minnesota Multiphasic Per- 
sonality Inventory (MMPI), the ACE Psy- 
chological Examination, and high school rank 
(HSR). 

Chi’ is the statistical test of significance 
used most frequently here and when used the 
distribution is split at the median. 


Results 


The AM derived more satisfaction, in gen- 
eral, from their membership than the PM. 
The first source of evidence for this result as 
well as for the “validation” of the selection 


method comes from five self-rating scales. 
Table 1 shows that the AM more than the 
PM believe the group regards them as im- 


Table 1 
Self-Ratings of Participation, Perceived Importance to 
Group, Group Belongingness, Agreement with 
Actions of Group, and Acceptance of Group 
Decisions when in Disagreement* 


Active Passive 
Members Members 
(N=41) (N=37) 
9 


Zo % 





Participate More or Much More 

in Sorority Activities than 

Most Members 5 
Important and Very Important 71 14 
Real Part of Sorority 98 62 
Agree Most of Time with Actions 

of Group 73 30 
Complete Acceptance of Group 

Decisions 49 11 





* All of these differences are significant well beyond 
the 5% level by Chi. 


Ben Willerman 


portant, and consider themselves as partici- 
pating in the sororities’ activities more than 
most members. A larger proportion of AM 
express feelings of group belongingness, are 
satisfied with, and agree most of the time with 
the group’s decisions. 

The second type of evidence relates more 
to the sources of satisfaction and dissatisfac- 
tion than to the amount of satisfaction. The 
free answers to the questions, “What do you 
like most ... ?” and “What do you least like 
about your sorority?” were coded using cate- 
gories defined largely in terms of the actual 
answers. Coding agreement between two 
coders based upon 64 questionnaires for the 
“like most” answers was almost 75% and for 
the “like least” answers, based upon 34 cases, 
was about 80%. Since coding in some in- 
dividual categories was very unreliable, to 
conserve space no table of these results is 
presented and only outstanding differences 
will be discussed. 

Two main differences occur. The AM girls 
like the “group spirit, cooperation, and 
unity” of their sorority more than the PM 
(37% vs. 5% , Chi?, P< .05). The AM also 
more frequenily like least the “lack of inter- 
est or cooperation of some members” (51% 
vs. 5%, P< .05). The PM more frequently 
like least the “compulsory functions” (16% 
vs. 0%, Fisher’s exact test for 2 x 2 tables, 
P<.05). Although the difference is sig- 
nificant at only the 10% level, the PM more 
frequently complain that the sorority takes 
too much of their time. We may infer from 
these differences that the AM derive their 
satisfactions and dissatisfactions from the at- 
tainments or frustrations of the organizational 
goals, while the PM are less oriented this way 
and seem to regard some features of the so- 
rority as an interference with their personal 
life. 

One specific hypothesis concerning the lower 
interest of the PM in the sorority organiza- 
tion was suggested by the staff members of 
the University’s Student Counseling Bureau 
on the basis of their experience with the 
MMPI. Girls who had an intense interest in 
male companionship tended to have on the 
MMPI a relatively high psychopathic deviate 
(Pd) score and a relatively low masculinity- 
femininity (Mf) score (low on this scale means 
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more feminine). It was reasoned that this 
type of attraction to persons outside of the 
sorority would be associated with lowered 
interest in the sorority. On this hypothesis 
the discrepancy scores between the Pd and 
Mf scales were compared for the two groups 
with the expectation that the discrepancy 
scores for the PM would be greater than those 
for the AM. The results together with some 
other test scores are shown in Table 2. While 
the difference between the two groups is not 
significant at the conventional 5‘% level, the 
t test (two-tailed) is at approximately the 9% 
level. There is, then, the possibility that one 
of the causes of low participation in the so- 
rority group is the strong interest in men. 
Another interpretation is that when irresponsi- 
bility or non-conformity (high Pd) is coupled 
with a low interest in organizational functions 
(with which femininity may be correlated), 
the result is a girl neither adept at nor inter- 
ested in the tasks of maintaining an organiza- 
tion. 

If active participation in a group is the 
result of more than just situational circum- 
stances, we should expect AM to be more ac- 
tive in other organizations. While we do not 
have a direct measure of amount of time or 
effort devoted to other organizations we do 
have evidence concerning the number of mem- 
berships held. This measure which is prob- 
ably related to active participations shows 
that the AM belong to more extra-sorority 
groups (Chi*, P < .01). 


389 


The greater participation of the AM in 
other organizations does not seem to be limited 
to the formal aspects of participation. In 
response to the question asking who their best 
friends at the University were, the AM men- 
tioned 2.00 and the PM mentioned 1.09 girls 
on the average who were not members of the 
sorority (Chi?, P < .05). 

Logically, passive participation may result 
from either low motivation in the direction of 
participation or, if motivation is present, from 
counter-tendencies which oppose participa- 
tion. The latter may be labelled “restraints” 
against participating. A plausible type of 
restraint in social situations is “fear of failure” 
or lack of self-confidence. To test the possi- 
bility that the PM _ possess characteristics 
which block participation, the K scores of the 
MMPI were compared for the two groups. 
The justification for the use of K as a measure 
of self-confidence rests upon an inspection of 
the items, many of which have “face validity,” 
and upon the opinion of some counselors who 
believe that K often reflects genuine self-con- 
fidence rather than defensiveness. The re- 
sults are consistent with the above reasoning. 
The mean K score for the PM is significantly 
lower than for the AM. 

Another condition which may prevent an 
individual from contributing to a group is 
lack of skills demanded by organizational 
tasks. In a sorority the organizational tasks 
seem to require a relatively high level of ab- 
stract ability and skill in communicating. A 


Table 2 


Scores on HSR, ACE, and MMPI* for Active and Passive Members 


Active Members 


Measure N 
HSR 37 
ACE 37 

K 31 
Pd 31 
Mf 31 
Pd-Mf 31 4.1 


Meant SD 


83.9 16.2 
64.9 25.6 
60.3 7.9 
52.8 8.5 
48.8 7.0 
11.0 
43.4 Pe 


Sie 18§ 


Passive Members 
N Meant SD r 


36 64.6 
35 44.7 
26 53.4 
26 55.8 
26 46.2 
26 9.6 
16§ 55.0 


~ 
— 


<.01 
<.01 
<.01 
>.10 
>.10 
<.10 > .05 
<.01 


Nm bh 
ornnuwn 
eons 


13.0 
10.2 


* Data of other scales of MMPI not reported because they were not regarded as relevant to this study. 

+ Mean percentile scores for HSR and ACE; mean T scores for MMPI scale 

t Critical ratio. was used because of unequal variances. 

§ The N’s for this scale are fewer than for other scales because some subjects’ tests were scored before the Sie 


scale was constructed. 
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girl low in verbal ability might either be dis- 
couraged from participating by her iellow 
members or impose restraints upon herself be- 
cause of fear of failure. Another possibility 
is that the necessity for maintaining a mini- 
mum grade average would leave a girl of low 
academic ability little time for organizational 
activities. 

Although we do not have sufficient evidence 
to choose among these alternative hypotheses, 
a comparison of the mean scores of the ACE 
(total) indicates that skill as a determining 
factor is an effective variable. Table 2 shows 
that a difference of 20 points in ACE exists 
between the two groups. In addition, the 
HSR, a measure of past academic achieve- 
ment, gives similar results. 

Since many of the girls in the PM group 
had high ACE scores and many had low Pd- 
Mf discrepancy scores, the question was raised 
as to why these girls were passive participants. 
The answer to this question may be found in 
the correlation between these two sets of 
scores which for the PM is r = .45 (N = 25, 
P< .05). Thus, the more a PM has an in- 


dicator related to active participation (high 
ACE), the more she tends to have an indica- 
tor which is related to passive participation 


(high Pd-Mf discrepancy). 

For the AM group the variance of ACE 
scores is lower and moreover no such rela- 
tionship was predicted. The corresponding 
r of — .22 is not significant. 

The large and statistically significant differ- 
ences between the two groups on the Social 
Introversion-Extroversion scale of the MMPI 
is perhaps further validation for the scale 
(2, 3, 4) and reinforces the hypothesis of the 
existence of personal factors producing differ- 
ences in participation. 


Summary and Conclusions 


The purpose of this study was to test the 
hypotheses that active and passive participa- 
tion in the organizational functions of a group 
was related to motivation to belong to the 
particular group, to general tendencies to be 
oriented toward participating in groups, and 
to skill in performing the tasks required by the 
organization. 

The nomination technique was used to 
select 41 girls who were active members and 
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37 girls who were passive members from a 
total of 19 college sororities. Self-rating 
scales, sociometrics and test scores provided 
the basic data. 

1. The active members were more attracted 
to their groups and seemed to derive satisfac- 
tions and dissatisfactions more from the or- 
ganizational features of the sorority than the 
passive members. 

2. The active members belonged to more 
student organizations and had more friends 
outside the sorority, indicating a general tend- 
ency to be attracted to organizations and 
to be socially inclined more than the passive 
members. 

3. Using the discrepancy score of the Pd 
minus Mf scales of the MMPI as an indicator 
of strong interest in men which detracts from 
affiliation with the sorority, the passive mem- 
bers turned out to have higher scores. How- 
ever, an alternative hypothesis that this dis- 
crepancy score represents a combination of 
non-conformity (Pd) and absence of interest 
in organizations (Mf) is also plausible. 

4. The lower K scores of the MMPI for 
the passive members are interpreted as the 
presence of lack of confidence which operates 
as a “restraint” against participating. 

5. As measured by the Sie scale of the 
MMPI, the passive members are definitely 
more introverted, indicating a general tend- 
ency for personal factors determining par- 
ticipation in a particular group. 

6. The active members were 20 points 
higher than the passive members on both the 
ACE and HSR. As indirect measures of apti- 
tude and ability for organizational tasks these 
differences lend support to the hypothesis that 
skill is an important factor in participation. 
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It is a basic assumption of contemporary 
advertising and politics that people’s judg- 
ments can be swayed by the opinions of in- 
dividuals with high prestige. Clear-cut proof 
or disproof of this assumption in field situa- 
tions is difficult. For example, an attempt to 
test the effects of endorsements by sports 
heroes on the sale of cigarettes would prob- 
ably be beset by many problems. It may be 
of interest therefore to test a parallel assump- 
tion in the laboratory. The effect of one in- 
dividual’s judgments on those of another has 
been studied extensively through the use of 
an experimental design first employed by 
Sherif (9). In this, judgments are made first 
by each S alone, then in a group situation. 
Some of this work has indicated that the de- 
gree to which any one S will abandon his own 
judgment range for that of a partner will de- 
pend, in part, on the characteristics of that 
partner (3, 4, 5). 

The present study was designed to test the 
effect of variation in a partner’s prestige on 
social interaction in such an experimental de- 
sign. Art judgments were chosen because 
these can be demonstrated to be stable in the 
absence of social stimulation, and because the 
manipulation of prestige in the area of art is 
relatively simple. The following experimental 
hypothesis was tested: Ss will be influenced 
more by the judgments of a partner with high 
prestige than by those of one with low 
prestige. 

Method 


A group of forty undergraduate students in 
Washington Square College was given the All- 
port-Vernon Scale of Values. From these 
were chosen three groups of ten Ss matched 
for their percentile scores on the scale of 
aesthetic value (1). All of these Ss were 


* Formerly at Washington Square College, New 
York University. The writer wishes to acknowledge 
the assistance of Mr. Frank Celentano in the con- 
duct of the experiment. Mr. Ted Climis played the 
roles of “fellow student” and “art director.” 


then given the Meier Art Judgment Test in- 
dividually. Except for the omission of in- 
formation concerning differences between the 
paired plates, standard instructions were fol- 
lowed. In a second session one week later the 
Ss were told that they were to repeat the test 
in order to determine its reliability. Ss in 
Group I (control) repeated the test substan- 
tially as in the first situation. Ss in Groups 
II and III were told that, to save time, the 
test would be given to pairs. A confederate 
of the experimenter was the second member of 
each pair. He was introduced to Ss in Group 
II as a fellow-student, to Ss in Group III as 
the art director of a local advertising agency 
interested in the results of the test. 

In the together situation both members of 
the pair judged each of the pairs of pictures; 
the S was first in all even trials, the confeder- 
ate in all odd trials. The confederate had 
memorized the test; he consistently stated the 
preference indicated as “wrong” by the scor- 
ing key. 

Results 


The degree of social influence is expressed 
in terms of the change in the number of 
wrong preferences from the first to the second 
session. Table 1 gives mean wrong judgments 


Table 1 
Mean Wrong Choices on the Meier Art Judgment Test 
M, Gives Results for Pretest My: Alone for Group 
I, with “Fellow Student” for Group IT, and 
with “Art Authority” for Group ITI] 


Note: S first on even trials, partner first on odd. 


Group IIT 


Group I 


Group I 
Odd Even 
M, 14.6 13.6 me i827 
Mi 14.8 13.9 17.0 14.5 
teat 10 1 3.9 53 13.0 1.3 


p >8 >.8 <.01 >.6 <.001 >.2 
N 10 10 10 10 10 10 


Odd Even Odd Even 
10.9 16.1 
19.6 18.2 
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for both sessions for each group with the odd 
and even trials treated separately. Also in- 
cluded are ¢t’s for the differences between mean 
wrong judgments alone and with a partner.’ 

Group I showed no significant change in 
the frequency of wrong judgments from situa- 
tion one to situation two. This would be an- 
ticipated. from the high reliabilities reported 
for this test: coefficients of reliability range 
from .70 to .85 (7). Groups II and III 
showed no significant change in the frequency 
of wrong judgments for the even trials where 
S made his choice first. It can be assumed 
therefore that there was no tendency for as- 
sociation with a partner of such deplorable 
taste to affect the general ability of S to dis- 
criminate good from bad pictures. However, 
in the odd trials, where the partner made his 
choice first, both Groups Il and III demon- 
strated an increase in the frequency of wrong 
judgments (cf. Table 1). The differences are 
significant at the .01 level for Group IH, at far 
better than the .001 level for Group III. 

An examination of the data for individual 
Ss reveals that in Group II two of the ten Ss 
showed fewer wrong responses in the second 
situation than in the first. All of the Ss in 
Group III gave more wrong responses in the 
social than in the individual situations. In- 
terpretation of differences between Groups II 
and III is justified to the extent that they are 
drawn from a relatively homogeneous popula- 
tion, and were matched for aesthetic values. 
The mean increase in wrong responses is sig- 
nificantly greater in Group III than in Group 
II: Mpu — = 3.8, Mow = Pub, t= 1.70, 
n = 9, p approaches .05 when the difference is 
evaluated in terms of a one-tailed hypothesis. 
This hypothesis is considered valid because 
the experimental hypothesis presented did not 
involve the probability of decrease in wrong 
responses, and because it was predicted that 
Group ILI would show more shift than Group 
Il. Certainly the difference between a ¢ of 
3.9 for Group IT and 13.0 for Group III would 


1For detailed table showing frequency of wrong 
judgments by each S in each situation order Docu- 
ment 3917 from ADI Auxiliary Publications Project, 
Photoduplication Service, Library of Congress, Wash- 
ington 25, D. C., remitting $1.25 for microfilm 
(images 1 inch high on standard 35 mm. motion pic- 
ture film) or $1.25 for photoprint readable without 
optical aid. 
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indicate that the changes should be more re- 
liable for the latter even though there is no 
way of comparing the two ¢ values quantita- 
tively. These results demonstrate that the 
judgments of a partner affected the responses 
of Ss taking the Meier Art Judgment Test, 
and that Ss tended to converge more con- 
sistently towards a partner with high prestige 
(art director, Group III) than towards one 
with little prestige (fellow student, Group II). 


Discussion 


The writer has suggested (6) that the group 
judgment situation may be considered to 
create conflict for S between a tendency to 
continue giving his prior judgments, and one 
to agree with his partner. The findings of the 
present study indicate that the latter tendency 
may be affected by the partner’s prestige. 
This does not deny the role of other factors in 
determining the extent of group influence. 
The effect of expert opinion on other kinds of 
judgment has been extensively investigated 
with, unfortunately, no consistent results; in 
some work “experts” were less effective in 
shifting judgment than “ordinary people,” 
in others more effective (8, pp. 946-980). 
The differences among these reports may be 
due on the one hand to unreported or un- 
analyzed differences in other determinants of 
interaction: the stimulus factor, the nature 
of the response, the personalities of the Ss 
themselves. On the other hand, this variation 
in the reported results may be due to an in- 
adequate specification of the nature of pres- 
tige (2). The teacher may be prestigeful in 
some areas, not in others. The “expert” may 
exert maximum influence only when his “ex- 
pertness” is accepted by S as genuine. Pres- 
tige, thus, might have to be measured in terms 
of influence on judgment. 

One possible way of avoiding this circu- 
larity would be to attempt to relate various 
independent determinants of prestige to de- 
gree of convergence between members of a 
coacting pair of Ss. These could include 
status in a hierarchy, group membership, or 
past history of contact between Ss. In the 
present study prestige is varied by means of 
instructions to S regarding his partner’s group 
membership (expert vs. non-expert). The 
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results indicate at least that under relatively 
controlled conditions it is possible to produce 
consistent variation in the degree of social in- 
teraction as a function of prestige manipulated 
in this manner. In further investigations at- 
tempts will be made to vary the prestige fac- 
tor more extensively through variation in 
other determinants. 

From a practical point of view, the present 
findings, with an obvious extrapolation, sup- 
port the reliance on “expert testimonial” 
which has long been accepted practice in busi- 
ness and politics. However, the above dis- 
cussion indicates that a caution is necessary. 
It is not always possible as yet to predict 
when an externally labelled authority will ac- 
tually be accepted as an authority and will be 
able to exercise appreciable influence on judg- 
ment. 


Summary and Conclusions 


In a test of the effect of variation in one 
partner’s prestige on the interaction of ob- 
server pairs three groups of ten Ss, equated 
for interest in art by means of the Allport- 
Vernon Scale of Values, were given the Meier 
Art Judgment Test. Ss in Group I repeated 
the test alone; Ss in Group II and Group III 
repeated it with a partner. He was introduced 
to Group II as a fellow student, to Group III 
Ss as an “art authority.” The partner in all 
cases made choices indicated as wrong by the 
scoring key. 

Degree of social influence was measured in 
terms of the shift in frequency of wrong judg- 
ments from the “alone” to the “social” situa- 
tion. Group I (control) showed no significant 
shift in mean number of wrong judgments. 
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Both Groups II and III showed an increase 
significant for Group II at the .01 level, for 
Group III beyond the .001 level. Comparison 
of the two groups using a one-tailed test of 
significance shows Group II1 (art authority) 
giving a greater increase of wrong judgments 
than Group II (fellow student). This differ- 
ence approaches significance at the .05 level. 

It is concluded that the judgments of Ss 
taking the Meier Art Judgment Test were af- 
fected by the responses of coacting partners, 
and that this effect was a positive function of 
the partner’s prestige. 


Received November 21, 1952. 
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The purpose of this study is to evaluate 
the relative effectiveness of four alternative 
methods for conducting brief critiques of a 
short problem-solving exercise designed to 
assist gruups (air crews) to function more 
effectively as groups. 

In many training situations, both military 


and civilian, it is necessary to conduct brief — 


on-the-spot critiques of a group’s performance. 
Instructors of the Advanced Strategic Air 
Command Survival School, the scene of the 
present study, are faced with this problem 
many times during the course of the field 
training of each crew they instruct. In all of 
these situations, there is the problem of how 
much guidance by the instructor or expert 
produces the best results. Can a crew effec- 


tively criticize itself and improve its problem- 
solving performance, or is the assistance of the 
expert necessary? When the expert conducts 


the critique, should he be the evaluator or 
should he keep the locus of evaluation within 
the crew? 


Theoretical Considerations 


Much has been written in the areas of coun- 
seling and guidance and industrial training 
about techniques applied to the individual to 
bring about proper evaluation and improved 
adjustment or performance. One set of con- 
siderations deals with the locus of evaluation. 
One group, of which Rogers is the chief 
spokesman, holds that only when the locus 
of evaluation is in the individual does real 
growth and development take place (20). 
According to this theory, an evaluation by an 
expert or an evaluation resulting from a test 
would remove the locus of evaluation from 
the individual and would not result in de- 
velopment and growth. Essentially the same 
theory is represented in the work of Cantor 
(1, 2), Maier (14), Lippitt (12), French 
(4), Katzell (10), Haas (5), and others. 

If one were to apply this theory to the 


problem of critiques, the superior method 
would be expected to be one in which the 
leader assumes a non-evaluative role and 
stimulates the group to evaluate its perform- 
ance and discover improved methods. 

A second set of considerations centers about 
the role of group decision in changing be- 
havior. Recent findings in industrial research 
and nutritive education research (6, 8, 11) 
indicate that group discussion as such results 
in very little change in behavior, while group 
decision as a component of group discussion 
brings about considerable change. In these 
experiments, scientifically developed informa- 
tion was given by the expert as it was needed 
but the decision was left to the group. Haire 
(6) points out, however, that group decision 
does not work with passive or apathetic 
groups, although its use almost always stimu- 
lates a desire for participation and eventually 
changes the apathy. 

A number of experiments have explored 
situations and leadership techniques which set 
up resistance or retard growth, and others 
which win acceptance or stimulate growth. 
The problems of resistance have been treated 
by Zander (22), Torrance (20) and Coch and 
French (3). All emphasize the importance of 
respecting the individuals or groups involved. 
A variety of methods are discussed by Maier 
(14, 15, 16), Cantor (1, 2), Haas (5), Haire 
(6), Lippitt (12), and Rogers (18). There 
seems to be agreement that improved per- 
formance does not result merely through 
reading or hearing lectures. More active 
participation methods, such as through dis- 
cussion and role playing procedures, are re- 
quired. 

The skill of the leader must also be con- 
sidered as a factor. A series of experiments 
conducted by Maier (17, p. 170) showed that 
“a leader, if skilled and possessing ideas, can 
conduct a discussion so as to obtain a quality 
of problem-solving that surpasses that of a 
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group working with a less skilled leader and 
without creative ideas. Further, he can ob- 
tain a higher degree of acceptance than a less 
skilled person.” 

Maier concludes, however, that “even an 
unskilled leader can achieve good quality solu- 
tions and a high degree of acceptance” using 
democratic leadership. In another experi- 
ment (16), he demonstrated the superiority 
of the permissive discussion leader over the 
self-critique discussion with an observer pres- 
ent. Maier maintains that the major part of 
the difference was due to the relatively greater 
influence exerted by individuals with minority 
opinions in the “leader” groups than in the 
“observer” groups. “A discussion leader can 
function to up-grade the group’s thinking by 
permitting an individual with a minority 
opinion time for discussion” (16, p. 287). 


Method and Procedure 


Subjects. The subjects of the experiment were 
57 combat air crews undergoing training at the 
Strategic Air Command’s Advanced Survival 
School at Stead Air Force Base, Nevada. Most 
of these crews were B-29 (11 men) crews, but a 
few B-50 (10 men) and B-36 (usually about 15 
men) crews were also included. Most of the 
crews had been functioning as crews for about 
four months, although some had been together 
for two or more years. 

Problem-Solving Exercises. Two of the Intel 
lectual Talents Tests (401-B and 701-X) devel- 
oped by the Human Resources Research Labora- 
tories were used. Both tests are thought to tap 
common-sense judgment and are alike in that 
each presents the examinee with problem-situa- 
tions too complex for solution by any step-by- 
step logical reasoning process and requires the 
examinee to select the most essential or most 
critical of the many elements presented in the 
problem-situation. The problem-situations are 
rather commonplace and can be solved on the 
basis of knowledge gained from background ex- 
periences common in most persons’ lives. Differ- 
ences in the 401-B and the 701-X are that the 
701-X consists of a larger number of shorter 
problems and permits an unlimited number of 
choices. 

Experimental Procedures. The crews were 
tested in tents measuring 16 feet by 32 feet on 
the first day of their training. Each crew was 
first given an orientation regarding the nature 
and purpose of the test. Following this, each 
member of the crew was asked to make an esti- 
mate of his crew’s performance. The first prob- 
lem-solving test was then administered, after 
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which a post-test estimate of crew performance 
was obtained from each crew member. 

Following this, a critique of the first problem- 
solving performance was conducted by one of the 
following methods: 

1. Unstructured non-authoritarian or crew-cen- 
tered critique: The crew was asked to evaluate 
and discuss its own performance. Discussion was 
centered on both the decision as to method and 
the way it was reached, as well as the way the 
decision was executed. The experimenter tried 
to stimulate discussion and encourage crew mem- 
bers to evaluate their performance, but the ex- 
perimenter did not evaluate their performance. 
The experimenter accepted questions but referred 
them back to the crew. The attitude of the 
experimenter was definitely non-authoritarian. 
Techniques used were similar to those described 
by Cantor (1, 2), Maier (14), and Rogers (19). 

2. Directive or expert critique: The experi- 
menter diagnosed the performance of the crew 
according to a set of 13 rating scales (listed 
later), pointed out ineffective procedures, and 
suggested ways of improvement. He stated that 
through research, certain characteristics have 
been found to differentiate between crews which 
operate effectively and those which do not. The 
analysis included both the way the group went 
about making its decision and what they decided, 
as well as how they worked together to carry out 
the decision. 

The experimenter took a very active role, as- 
suming the role of the “expert.” He tried, how- 
ever, to give his advice in the most tactful way 
possible. He. nonetheless, gave definite evalua- 
tions and advice. The experimenter accepted 
questions and answered thetn as an “expert.” 

3. No critique: The experimenter went ahead 
and administered the California F-Scale which 
required about 15 minutes, before administering 
the second problem-solving test. 

4. Self-critique: Time was allotted for a critique 
and the experimenter left the tent, returning after 
15 minutes. 

5. Structured non-authoritarian or crew-cen- 
tered critique: The experimenter used the set of 
rating scales as a guide in getting the crew to 
evaluate itself and discover more effective ways 
of performing. The locus of evaluation was still 
within the crew, however. 

Following the 15 minute critique period, the 
second problem-solving test, the 701-X, was ad- 
ministered. The rules were the same as for the 
first problem except that the time limit was ten 
minutes. 

Observations and Ratings. After each of the 
two problem-solving tests, the experimenters com- 
pleted a set of five-point rating scales following 
a set of descriptive scales on each of the follow- 
ing characteristics: (1) Organization of man- 
power; (2) Selective use of personnel; (3) Su- 
pervision; (4) Participation in decision-making; 
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(5) Acceptance of suggestions or criticisms; (6) 
Consideration of available time; (7) Checking 
work; (8) Leadership function; (9) Survey of 
the situation; (10) Understanding instructions; 
(11) Group atmosphere; (12) Speed of reaction 
to the problem situation; and (13) Officer-air- 
men relations. 


Results 


A problem-solving score was computed for 
each crew on both of the problem-solving 
tests, using the scoring formulae already in 
use for these tests. A performance rating was 
also computed for each crew on both of the 
problem-solving situations by adding the 
thirteen ratings made by the examiner. In 
order to hold constant scores and ratings for 
the first problem-solving test and to determine 
if the variance in scores and ratings is due to 
the method of conducting the critique, analy- 
ses of co-variance were then carried out both 
for ratings and for scores. Using the ratings, 
the variance for critique methods was found 
to be statistically significant at the one per 
cent level of confidence (F = 4.968). Using 
problem-solving scores, however, the variance 
was not statistically significant at less than 
the five per cent level of confidence (F 
= 1.957). Because of the small number of 


crews critiqued by each experimenter by each: 


method, it was not possible to compute 
the interaction of experimenter and critique 
method. 

Crews participating in the unstructured 
non-authoritarian critique were combined with 
those participating in the self-critique and 
crews participating in the expert critique were 
combined with those participating in the struc- 
tured non-authoritarian critique in order to 
study the effect of structure vs. non-structure 
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in critiques. Analysis of co-variance revealed 
that the variance due to structure is significant 
at the five per cent level both for ratings (F 
= 5.664) and for scores (F = 5.124). Analy- 
sis of co-variance also showed that the vari- 
ance due to different experimenters is not 
statistically significant (F = 0.429) for rat- 
ings and for scores. 

In order to study relative improvement in 
performance which might be attributable to 
differences in methods of conducting critiques, 
each crew was ranked in order from one to 
fifty-seven on each of the four variables (score 
on 401-B, score on 701-X, ratings on 401-B 
performance, ratings on 701-X performance). 
Crews were then divided equally into a most 
improvement category and a least improve- 
ment category on ratings and on scores. 
Table 1 shows the percentage falling into each 
category according to method of conducting 
the critique for both ratings and scores. 

The t-test of significance of differences in 
percentage reveals the superiority of the ex- 
pert critique over the non-authoritarian cri- 
tique (significant at the .001 level of con- 
fidence), no critique (significant at the .01 
level), and the self-critique (significant at the 
.02 level). The differences in percentages be- 
tween the expert critique and the structured 
non-authoritarian critique is not statistically 
significant. The latter tends to be more fre- 
quently followed by improvement than are 
the unstructured non-authoritarian critique 
(significant at the .01 level of confidence), no 
critique (significant at about the .10 level of 
confidence), and the self-critique (not statis- 
tically significant). 

The situation in regard to improvement on 
problem-solving scores is about the same as 


Table 1 


Comparison of Effectiveness of Methods of Conducting Critiques 





Expert 
Critique 
(11 crews) 


Basis of Comparison 


Unstructured 
Non-Authori- 
tarian 
Critique 
(11 crews) 


Structured 
Non-Authori- 
tarian 
Critique 
(11 crews) 


Self- 
Critique 
(12 crews) 


No 
Critique 
(12 crews) 





P : vi ost improv 
Percentage showing ‘most ement” 
in standing on scores 
e age showir ost improvem 
Percentage showing “most r ent” 
in standing on ratings 


73 


91 


73 33 36 33 


64 50 9 33 
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for ratings, except that the superiority of the 
expert critique is not as clear. The ¢-test of 
significance of the difference in percentage 
shows that the expert and structured non- 
authoritarian methods are superior to the un- 
structured non-authoritarian, the self-critique 
and no critique at about the .02 level of con- 
fidence. The unstructured non-authoritarian 
method and the self-critique appear to have 
no superiority over no critique. 


Discussion 


The fact that the structured non-authori- 
tarian is superior to the unstructured non- 
authoritarian method and that the expert 
method is not superior to the structured non- 
authoritarian method would suggest that the 
locus of evaluation is not important in the 
type of critique studied in this experiment. 
Of course, it may be that even though the 
“expert” makes evaluations, the crew still 
makes its own evaluations and does not sur- 
render its evaluative function to the expert 
as readily as some might suppose. A close 
examination of crews subjected to the expert 
method and making little improvement in- 
dicates that some of the evaluations given by 
the “expert” were definitely rejected by the 
crew. The crucial thing may be the giving of 
evaluations that can be accepted rather than 
the giving or not giving of evaluations. 

The issue of group decisions does not be- 
come crucial in this experiment since in every 
case the decision was left to the crew, al- 
though that decision may have been made by 
one person, usually the aircraft commander. 
In using the unstructured non-authoritarian, 
however, it was observed by almost all of the 
experimenters that a crew would recognize 
and discuss improved solutions and even ap- 
pear to give general approval to these solu- 
tions. Yet, when the time came to decide how 
to organize for the second problem, the Air- 
craft Commander would simply say, “We'll 
do it the same way we did the other one.” 
This may explain why this method is not 
more effective than no critique of any kind. 

In regard to the overcoming of resistance, 
the less structured methods are least effective. 
It must be mentioned, however, that some of 
the crews which made the most outstanding 
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improvement were crews using the self-cri- 
tique. The difficulty is that not all crews are 
able to look objectively at their performance 
and discover more effective ways of working 
together. Most crews seem to require enough 
structure or guidance to assure that their 
evaluations and considerations will be con- 
cerned with the salient elements. This does 
not in any way deny the importance of the 
participation and involvement of the group. 
It does, however, emphasize the importance of 
the “expert” and the nature of the role he 
must play in order to be effective where single 
trial, immediate performance is concerned. 

Although the variation due to experimenter 
differences was not significant, differences in 
the success of experimenters were observed. 
For example, 70 per cent of the crews cri- 
tiqued by two of the experimenters were in the 
“most improvement” category while only 25 
per cent of the crews of another experimenter 
were in this category (significantly different 
at about the 5 per cent level of confidence). 
The least well trained experimenter differed 
very little from the best trained experimenters. 

The results would appear to have important 
implications for training of many types, espe- 
cially training of the on-the-job variety in 
industry, education, and the military services. 
Although there are a number of questions 
which need to be subjected to further study, 
the results of this study seem to point the way 
to using structured critiques where decisions 
are still left to the group, where final evalua- 
tion is left to the group, but where the trainer 
can help guide the evaluative process. This 
study also suggests several directions for 
further research which are being pursued 
through a series of additional studies now 
under way. These studies are concerned with 
the role of the expert, the decision-making 
techniques of the group’s usual leader, spread 
of learning within the group, and transfer of 
learning to more different situations. 


Summary 


A total of 57 combat air crews undergoing 
survival training were divided randomly into 
four experimental groups and one control 
group. Each experimental group was admin- 
istered a problem-solving test, critiqued ac- 
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cording to one of four methods, and then ad- 
ministered a second problem-solving test. The 
control group was given no critique between 
the two problem-solving tests. 

Crews obtained scores on both of the prob- 
lem-solving tests and ratings of manner of 
performance on both of the tests. 

Analysis of covariance indicates statis- 
tically significant variances in ratings due to 
method of conducting critiques. Analysis of 
covariance indicates statistically significant 
variance in both scores and ratings due to 
structuring the critique but no statistically 
significant variance due to experimenters. 

Crews critiqued according to the more 
highly structured methods are more fre- 
quently followed by “greater improvement” 
than are crews critiqued according to the less 
highly structured methods. Crews participat- 
ing in the unstructured non-authoritarian and 
the self-critique do not perform significantly 
better than crews receiving no critique. 


Received November 24, 1952. 
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Logical Reasoning: With and Without Training * 


William J. Morgan and Antonia Bell Morgan 
Aptitude Associates, Merrifield, Virginia 


It is very strange indeed that psychologists 
have paid so little attention to problems of 
logical reasoning. During the last 50 years 
no systematic and comprehensive approach 
has been undertaken by them toward these 
problems. We made a careful search of the 
literature since 1927 and found 21 references 
to experimental studies of logical reasoning, 
and we were rather generous in our interpre- 
tation of what constitutes an experimental 
study. Generally speaking, therefore, psy- 
chologists have been contributing less than 
one experimental study per year on prob- 
lems of logical reasoning. But they con- 
tinue to speculate and philosophize, not quite 
so often as the philosophers themselves, on 
the characteristics of this mental process. It 
is difficult to understand why supposedly 
hard-bitten, scientifically-minded psycholo- 
gists have given so little attention to this prob- 
lem. Perhaps in their failure to exploit the 
findings of Stérring (7, 8,9) and Eidens (2). 
they became discouraged. Psychologists seem 
to be under the delusion that logical reason- 
ing is confined to the syllogism, a view which 
has long been abandoned by the logicians 
themselves. 

It may also be that psychologists have not 
been willing to undertake experimental studies 
of logical reasoning, because they were, for 
such a long time, desperately trying to divorce 
themselves from the influence of philosophy, 
and, of course, logic is an integral discipline 
of philosophy. Whatever may be the reasons 
for the paucity of experimental studies of 
logical reasoning, it seems to be a fact that 
mathematicians rather than psychologists are 
concerning themselves with logic. In spite 
of the stress laid on logical reasoning, espe- 
cially the deductive aspects, by Professor 
Clark Hull in his establishment of a behavior 
system, psychologists seem content to be- 
lieve that logic, like any other game or sport, 
is free to set up its own rules of how the game 

* This paper was presented before Section I, Psy- 
chology, of the American Association for the Ad- 


vancement of Science at its annual meeting, in St. 
Louis, Missouri, 30 December 1952. 


will be played. But unlike chess, or poker, 
or basketball, the rules of logic are basic to 
science. As H. M. Johnson, himself a psy- 
chologist, has said (3, p. 74) “No artful 
manipulation of symbols according to pre- 
scribed rules can make good logic out of bad 
logic . . . . The structure of science as we 
know it is predetermined by the definitions, 
postulates, and rules of manipulation of sym- 
bols that we call modern logic. This logic 
includes the whole of the traditional or Aristo- 
telian logic, cleared of certain well known 
defects; it includes a great deal that Aristotle 
and his imitators overlooked. . . . we may be 
sure that if any procedure assumes equivoca- 
tion, affirming the consequent, denying the 
antecedent to be valid, then it does not yield 
a set of rules for ‘scientific inference.’ ” 

In view of the absence of too much experi- 
mentation on logic by psychologists, it is not 
surprising to find cropping up some rather 
far-fetched notions about the nature of logi- 
cal reasoning. In the chapter on Speech and 
Language in Stevens’ Handbook of Experi- 
mental Psychology (4, p. 806), Professor G. 
A. Miller says: “The fact is that logic is a 
formal system, just as arithmetic is a formal 
system, and to expect untrained subjects to 
think logically is much the same as to expect 
preschool children to know the multiplication 
table.” 

This sentence, an argument by analogy, 
contains a number of interesting implications 
which might be subjected to analysis but we 
shall restrict ourselves to the assertion con- 
tained therein that untrained subjects can- 
not be expected to think logically. 

When we use the word “logic” we accept the 
definition given by Warren’s Dictionary of 
Psychology (10) where logic is defined as the 
“principles that enable an individual to make 
judgments or conclusions which are consistent 
with the data at hand.” 


Subjects and Procedures 


The Morgan Test of Logical Reasoning (5) 
was administered to 134 adults, all employed 
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by the United States Government. This test, 
which was first developed in 1946 for the test- 
ing of superior adults, contains 75 true-false 
items in verbal form. The scoring formula 
is Right minus Wrong. The subjects in this 
study were allowed 30 minutes. These are a 
few sample items from the test: 


(a) All highly successful businessmen are prac- 
tical psychologists. Therefore, some practical 
psychologists are highly successful businessmen. 

(b) Most executives are college graduates. 
The majority of executives are Republicans. 
Therefore, most college graduates are Repub- 
licans. 

(c) If we rearm Germany, the French will op- 
pose us, and if we fail to maintain air bases in 
East Anglia we shall incur the resentment of the 
British. But it is essential to retain the good 
will of either France or Britain. Therefore, we 
must maintain our East Anglian air bases or else 
abandon plans for the rearmament of Germany. 

(d) No person interested in treating human 
ailments has failed to study Professor Pavlov’s 
book on the nature of the digestive juices—a 
book that won the Nobel prize. No person who 
has failed to study Professor Pavlov’s book is a 
physician. Therefore, although they may have 
other interests, it can be said that all physicians 
are interested in treating human ailments. 

(e) Many women are high-strung and emo- 
tional. A high-strung and emotional tempera- 
ment is frequently a barrier to clear and logical 
reasoning. Therefore, many women are unable 
to reason logically. 

(f) You can fool some of the people all the 
time. You can fool all the people some of the 
time. Tharefore, you cannot fool all the people 
all the time. 


The subjects consisted of two groups (WL 
and WOL) of 67 each. All subjects in Group 
WL (With Logic) had had at least three 
semester hours of college training in logic. 
No subjects in Group WOL (Without Logic) 


had had any training in logic. Each person 
in Group WL was paired in terms of sex, age. 
and college degree(s) with a person in Group 
WOL.? In each group there were 58 males 
and 9 females with a mean age of 27 years 


1The pairs were matched in terms of educational 
achievement as measured by college degrees, rather 
than in terms of scholastic ability. However, for 61 
cases in the group with logical training and for 65 
cases in the group without logical training we have 
statistics derived from the Verbal Intelligence Test 
(published by Aptitude Associates, Merrifield, Va., 
copyright 1948). This test is scored by summating 
the rights, and the maximum score is 50. The mean 
score for Group WL was 38.3 (SD, 7.7); and the 
mean for Group WOL was 32.1 (SD, 9.5). The 
Critical Ratio (D/¢p) was 4.0. This difference is 
significant at the one per cent level. 
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and a standard deviation of 5.0. The oldest 
was 42, youngest 20, median 27. There were 
in each group 43 with a Bachelor’s, 16 with 
a Master’s, and 8 with an LL.B. degree (plus 
the Bachelor’s). In addition to Groups WL 
and WOL, there were 9 subjects with a Ph.D. 
degree, 7 males and 2 females. The oldest 
was 57, the youngest 26, the median 33, and 
the mean age 36 years. None of the Ph.D.’s 
had had any training in logic. 


Results 


The lowest score in Group WL was —2, the 
highest 67, mean 29.1, and the standard devia- 
tion 14.0. The lowest score in Group WOL 
was — 7, the highest 48, n.ean 21.2, and the 
standard deviation 11.2. The means were 
significantly different beyond the 1% level. 

By comparing the mean score for Group 
WL with the mean score for Group WOL, it is 
found that Group WOL did 73 per cent as 
well as Group WL. Since this test is scored 
Right minus Wrong, if a person is guessing 
throughout, he should get a zero score, all 
other things being equal. If a person does not 
know how to reason logically, he would have 
to guess on this test on every iiem, and we 
would expect him to get a zero score. But 
what do we find? Instead of a zero score, 
these college graduates who did not have the 
benefit of formal training in logic were ac- 
tually able to achieve a mean score of 21.2 
compared with a mean score of 29.1 for those 
who had had training in logic. In other words 
they did 73% as well as the group which had 
received training in logic. This is a far cry 
from zero. 

Although Group WL obtained a_ higher 
mean score on the test than Group WOL, it is 
remarkable that of the LL.B.’s, 7 of the 8 who 
had not received training in logic obtained 
higher scores than their paired partners; of 
the Master’s, 6 of the 16 who had not re- 
ceived training in logic did better on the test 
than their paired partners; of the Bachelor’s, 
13 of the 43 who had not received training in 
logic obtained higher scores than their op- 
posite numbers in the WL Group. In other 
words, 26 of the 67 subjects, i.e., 38%, in the 
WOL Group did better than their paired part- 
ners who had received training in logic. 

The lowest score for the Ph.D.’s was 23, the 
highest score 45, mean 32.7, and the standard 
deviation 7.4. There was far less variability 
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in scores in the Ph.D. Group by comparison 
with either Group WL or Group WOL. The 
mean score for the Ph.D’s is significantly 
higher at the 1% level than the mean score 
for Group WOL. The mean score for the 
Ph.D.’s is also higher than the mean score 
for Group WL, and the chances are 88 out 
of 100 that the difference is significant. 


Conclusions 


1. In the majority of cases, college grad- 
uates who have had at least three semester 
hours of college training in logic obtain higher 
scores on a test of logical reasoning than col- 
lege graduates who have not had courses in 
logic. 

2. Professor Miller’s hypothesis that un- 
trained subjects cannot be expected to think 
logically is not substantiated, however, be- 
cause: (a) 38% of the subjects who had had 
no training in logic obtained higher scores 
than their paired partners who had had col- 
lege training in logic; (b) subjects without 
college training in logic did 73% as well, as a 
group, on the test of logical reasoning as those 
who had had training in logic; and (c) college 
graduates with Ph.D. degrees who had not 
had college courses in logic obtained higher 
scores, as a group, than college graduates with 
B.A., M.A., and LL.B. degrees who had had 
courses in logic. 

3. Professor Miller's hypothesis might be 
restated to read, “In the majority of cases, 
untrained subjects cannot be expected to 
think as logically as trained subjects.” 


Further Implications 


There are two problems that need to be 
explored by further research: (1) Are students 
with facility in clear thinking the ones who 
are usually attracted to courses in logic? and 
(2) To what extent is proficiency in logical 
reasoning the result of formal courses in logic 
rather than attributable to other factors, such 
as the subject’s native intelligence? (11). 

We have found in other studies that scores 
on tests of logical reasoning always correlated 
positively, often substantially, and sometimes 
very highly with scores on group tests of in- 
telligence such as the Henmon-Nelson, the 
Thurstone ACE, the Miller Analogies, and the 
Verbal Intelligence Test. Other investigators 
such as Wilkins (12, p. 28), Burt (1, p. 237), 
and Sells (6, p. 23) have obtained similar re- 
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sults. We are, therefore, inclined to suggest 
that the ability to think logically is, to a cer- 
tain degree, an aspect of intelligence.* 

We believe that the potentiality for learn- 
ing to reason logically is dependent upon the 
native intelligence of the individual, and the 
rules by which logical reasoning is governed 
are learned in the daily experiences of life, 
sometimes in the classroom, with or without 
the benefit of instruction in formal logic. It 
would be desirable to find out what experi- 
ences and courses, other than formal courses 
in logic, increase the student’s proficiency in 
logical reasoning. It is our opinion that some 
courses, such as mathematics, even though 
not labeled as courses in logic, may have con- 
siderable “carry-over” value to logical rea- 
soning. 


Received December 31, 1952. 
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This inquiry was aimed at establishing the 
relative rates of recall, under normal listening 
conditions, of a given advertisement placed 
(a) at the beginning of a program (beginning 
advertisement) and (b) in the middle of a 
program (interruption advertisement). It 
was, in effect, an inquiry into the relative rates 
of recall of a beginning advertisement (B) 
and an interruption advertisement (I). This 
comparison was made in respect of normal 
(N) or at-home listening conditions and not 
in respect of the situation in which people 
listen with the intention of learning. What- 
ever the relative merits of the two adver- 
tisements in the latter situation, there are 
plausible grounds for theorizing that under N 
listening conditions various subjective evoca- 
tions such as hostility, inattention and certain 
defense mechanisms tend to enter effectively 
into the perception processes to the special 
detriment of I. 

The grounds for this theory are two-fold. 
In the first place, the interruption type of ad- 
vertisement emerged, in a preceding survey in 
Sydney, as the most disliked form of radio 
advertising and, indeed, as a source of no 
little hostility (2). Second, the work of Bzrt- 
lett, Levine and Murphy, Rapaport and others 
(1, 7, 9, 10) has indicated the importance to 
perception of affective tendencies, partisan- 
ship and various personal factors. 

While, however, a superiority of B over I 
would conform to the theory, such a compari- 
son would not provide a crucial test. Differ- 
ences in recall between B and I could, in fact, 
arise out of conditions other than differential 


* While the inquiry is presented here in compara- 
tive isolation, it was in fact conducted as part of a 
wider investigation into the relation of attitude to re- 
call in radio advertising. Findings from this wider 
study will be introduced only where they contribute 
to the interpretation of the present results. The in- 
vestigation was carried out in Sydney, Australia, 
where commercial broadcasting predominates (2). 
The author is now studying for the Ph.D. at Birken- 
beck College. 


subjective evocations. To provide a crucial 
test, it was necessary to examine differences 
in recall of B and I first where those subjec- 
tive evocations peculiar to the N situation 
were operative and secondly where they were 
eliminated. The latter situation required, in 
fact, the development of a learning set (L) 
in relation to B and I. 


Method 


Design. Four groups Gl, G2, G3 and G4 were 
matched according to intelligence, age, sex, oc- 
cupation and general background. Two of them, 
G1 and G2, heard B and I (respectively) under 
N conditions. This meant that there was an at- 
tempt to evoke in them N reactions (i.e., NB and 
NI). The other two matched groups, G3 and 
G4, heard B and I (respectively) after the de- 
velopment in them of L. Each of the four 
groups was subsequently tested for recall of the 
advertisement to which it had been exposed. 
Difference in recall (R) between G1 and G2 (i.e., 
“NB — “NI) represents the advantage in recall of 
one placement over the other. The full extent of 
the difference in recall which may be attributed to 
differential N reactions is equal to ("NB — “NI) 
— ("LB — *LI). 

It is conceivable, however, that such differences 
in recall as might occur could arise out of un- 
planned group differences in respect of personal 
characteristics and testing conditions. To provide 
a check on this possibility a control device was 
incorporated into the design. Advertisements B 
and I were carried by identical programs, though 
of course they were recorded on different -ires. 
Sections of additional advertisement were intro- 
duced in equivalent positions on each wire. These 
two additional sections constituted the control 
material on each wire and each group was also 
tested for recall of this control material. If un- 
planned differences had not occurred, then differ- 
ences in recall of control interest in either the N 
or the L situations should not be significant. De- 
tails of this control device are presented in Fig- 
ure 1. 

Material. Material used included two wire re- 
cordings of commercial programs, playback equip- 
ment, program opinion sheets and question book- 
lets. 

Two Wire Recordings of Commercial Pro- 
grams. The advertisements were carried by the 
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Beginning Placement of 
the Advertisement 


We have pleasure in presenting to you the 
|| story of “Sherry and Son,” brought to you 
|| by Raymonds, the makers of distinctive 
| sweets. Have you tasted Crunch Block, the 
|| sweet with the special flavour? It’s made 
}| by Raymonds, the sweet makers of distinc- 
|| tion. Only the best ingredients go into it— 
honey, glucose and nuts—all of them ideal for 
|| sweets. Crunch Block is really worth tasting 
|| and is obtainable at the manufacturer’s own 
store in the Royal Arcade and at confection- 
|| ers, grocers and milk-bars. 


Part 
Control 
Material | 


( Raymonds, the makers of Crunch Block, 
have developed a new process called Bubbling 
which makes the texture of the sweet fine 
and smooth. This product is vitamin packed 
|| and has special nutritive value and is manu- 
factured under strictly hygienic conditions. 
Crunch Block costs threepence and there is 
no shortage of supply. 





“SHERRY AND SON” 
(ALL) 


Interruption Placement of 
the Advertisement 


We have pleasure in presenting to you the 
story of “Sherry and Son,” brought to you 
by Raymonds, the makers of distinctive 
sweets. Have you tasted Crunch Block, the 
sweet with the special flavour? It’s made 
by Raymonds, the sweet makers of distinc 
tion. Only the best ingredients go into it 
honey, glucose and nuts—all of them ideal for 
sweets. Crunch Block is really worth tasting 
and is obtainable at the manufacturer's own 
store in the Royal Arcade and at confection- 
ers, grocers and milk-bars. 


Part 
> Control 
Material 





FIRST HALF OF 
“SHERRY AND SON” 


And now we briefly interrupt our story. 
Raymonds, the makers of Crunch Block, 
have developed a new process called Bub- 
bling which makes the texture of the sweet 
fine and smooth. This product is vitamin 
packed and has special nutritive value and is 
manufactured under strictly hygienic condi- 
tions. Crunch Block costs threepence and 
there is no shortage of supply. 








SECOND HALF OF 
“SHERRY AND SON” 





This program is brought to you by Ray- 
monds, the makers of Crunch Block, the 
sweet of distinction. Don’t forget to ask 
for it; you’ll enjoy its special flavour. 


Part 
Control 
Material 





This program is brought to you by Ray- |) 
monds, the makers of Crunch Block, the | | 
sweet of distinction. Don’t forget to ask 
for it; you'll enjoy its special flavour. 


Part 
Control 
Material 











Fic. 1. 


Text of the two advertisements and their positions relative to program and control material. 


Items on which recall was tested are italicized. 


program “Sherry and Son.” As shown in Fig- 
ure 1, the advertisements fell into several parts. 
On each wire there was an opening advertisement 
ard an end advertisement. On one wire, how- 
ever, half the opening advertisement had been 
shifted to the middle of “Sherry and Son.” This 
was the I placement. On the other wire the 
equivalent section of the advertisement was not 
moved and it was this placement which was 
called B. Hence it will be seen that as far as 
placement was concerned, the two advertisements 
had a preceding and a following statement in 
common, and it was this additional material 
which constituted the control. On the other 
hand, the experimental material was B/I. 
Program Opinion Sheet. This was a single 
sheet which asked for written opinions of each 


of three programs. It was part of the tech- 
nique used in deceiving subjects into reacting 
normally to the advertisement. Details follow. 

Instructions. N. Situation. The wire carry- 
ing B was played to G1 and that carrying I to 
G2. Subjects were told that the purpose of the 
session was to get their opinions of these pro- 
grams and were provided with opinion sheets for 
this purpose. These programs were said to be 
taken direct from the library of one of the com- 
mercial radio stations and to be just as they 
would be if they were going on the air. The 
first of the programs, a collection of three vocal 
items called “Just for You,” was played with its 
advertisement; the playback machine was stopped 
and subjects were asked to write their candid 
opinions of the program. The purpose of this 
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Table 1 


Recall Scores* on Beginning and Interruption Advertisements 


Beginning 
Advertisement 
Conditions of 

Exposure 


Normal Reaction 
Learning Set 


* Score out of 16 marks. 
t Two tails of the distribution. 


was to facilitate the deception. When subjects 
had finished this, they heard “Sherry and Son” 
with its advertisement and subsequently wrote 
opinions of that program too. After this the 
question booklets were distributed: there had 
been no warning at all of this step. Subjects 
were asked to write down their feelings (like/dis- 
like) about the advertisement in “Sherry and 
Son” and about advertising in general. They 
were then given recall tests by specific question * 
and multiple choice methods. 

L. Situation. The wire carrying B was played 
to G3 and that carrying I to G4. Subjects were 
told that while they would be asked for opinions 
of the programs, their main job was to listen to 
the advertisement in “Sherry and Son” and that 
they would be required to recall it at the end of 
the program. They were asked to concentrate 
on remembering the advertisements and to keep 
out of the picture any attitude they may have 
towards radio advertising in general or towards 
this particular advertisement. Opinions of the 
programs were asked for and recall tests were 
made as with G1 and G2. 

Scoring. Items on which scores were based 
are those italicized in Figure 1. Marks were 
given for each correct reproduction, one on the 
specific question system and one on the multiple 
choice system, making a total of 16 marks on 


1 What was the name of the product being ad- 
vertised ? What were the contents—the ingredients— 
of the product? I mean what did it have in it? 


Significance of 
Differencet 


Interruption 
Advertisement 


Mean 


1.88 
6.67 


CR P 
2.46 0.016 
2.26 0.032 


SD 
2.17 
2.87 


the 8 items included in the B/I material and a 
possible of 28 marks on the 14 items included in 
the control material. Marks for the recall of the 
names of producer and product were excluded 
from totals because these items were common to 
the B/I and the control material. 


Results 


From Tables 1, 2, and 3 it will be seen that 
in the N situation recall of I is very sig- 
nificantly less than recall of B (P = .016), 
whereas in the L situation, recall of I is very 
significantly greater than recall of B (P 
= .032). This represents a large and sig- 
nificant reversal of the advantage of I in 
going from the L to the N situation (P 
= 002). Expressed in terms of percentage 
of recall, B was recalled in the N situation 
about twice as well as I (21% vs. 12%), 
whereas in the L situation B was recalled only 
three quarters as well as I (31% vs. 42% 

This phenomenon does not, for the follow- 
ing reasons, appear to be an artifact arising 
out of unplanned differences between groups 
in respect of personal characteristics or test- 
ing conditions. First, recall of control ma- 
terial occurring with B and I, respectively, 


Table 2 


Recall Scores* on Control Material Occurring with the Beginning and 
Interruption Advertisements, Respectively 


Beginning 

Advertisementt 

Conditions of mo 
Exposure Mean 
7.92 
10.75 


SD 


2.90 — 
4.76 


Normal Reaction 
Learning Set 


* Score out of 28 marks. 


Interruption 
Advertisementt 


Significance of 
Difference 


Mean 


8.26 
11.42 


SD 


t Control material occurring with the experimental placement. 
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Table 3 


Characteristics of Matched Groups* 


Intelligence 


Group Mean SD 


Gl 5.04 0.81 
G2 5.30 0.71 
G3 5.09 0.87 
G4 5.30 0.82 


*T ~ Hi B ia 7 H ° 
The occupation and background of the four groups were the same: trainee carpenters under the Common 


Mean SD Sex 
26.40 
25.10 
25.17 
25.00 


Age 
Size of 
Group 
2.26 
2.00 
2.44 
2.60 


male 31 
male 41 
male 24 
male 24 


weaith Reconstruction Training Scheme; ex-servicemen (non-commissioned) recruited from semi- or unskilled 


occupations. 
Tt In standard scores. 


does not differ significantly in either the N 
(P = .64) or the L (P=.57) situations, 
while the slight advantage in the L situa- 
tion of control material occurring with B is 
maintained in the N situation (P = .79). 
Secondly, the four groups appeared to be well 
matched and test conditions did not notice- 
ably deviate from plan. 

Moreover, this reversal phenomenon was 
repeated with each of the 8 items in B/I on 
which recall tests were made. 


Discussion 


It is not difficult to theorize about the 
causes of the relative disadvantage of the I 
placement. A theory of inhibition through 
hostility would not only have a certain plausi- 
bility, but would also be supported by the 
fact, already reported, that in an accompany- 
ing survey in Sydney (2), the interruption 
placement emerged as the most disliked form 
of radio advertising. Some caution is needed, 
however, for it was also found (2) that ver- 
balized attitude (in terms of like/dislike), 
whatever its concomitant organic processes 
might be, was not correlated with recall (r 
= + .01+.11 with intelligence partialled 
out). Under the circumstances, there is some 
case for suggesting the existence of a gen- 
eralized defense mechanism of an involuntary 
type—a theory which gains additional sup- 
port from a further finding (2) of very little 
or no correlation between alleged degree of 
attention to the advertisement and recall (r 
= —.14+.11 with intelligence partialled 
out). Quite clearly, however, further theo- 
rizing and research are required at this point. 


A second interpretative point must be made. 
The present use of “captive audiences” leaves 
out of account certain aspects of the real home 
listening situation: people listening at home 
are usually free, during the broadcast of an 
advertisement, to walk about, talk or turn 
the set off. In fact, one of the claimed ad- 
vantages of the interruption placement of an 
advertisement is that people are less likely to 
walk about, tune out, etc., during the middle 
of a program than at the beginning. The 
present study was not, however, directly con- 
cerned with this issue, although the issue is 
one on which research might well be con- 
ducted. 


Summary and Conclusions 


The prime purpose of this inquiry was to 
compare, under normal listening conditions, 
rates of recall of an advertisement placed at 
the beginning and in the middle of a program. 
There was, in fact, a good case for theorizing 
that reactions such as hostility, inattention 
and defense mechanisms would normally enter 
the perception processes to the special detri- 
ment of the interruption type of advertise- 
ment. 

While a superiority in recall of the begin- 
ning advertisement would concur with this 
theory of differential reaction, a crucial test 
would also require a comparison of recall rates 
when such reactions were eliminated. Hence 
the full investigation involved a comparison 
of recall of the two advertisements after (a) 
normal reaction and (b) the establishment of 
a learning set. 
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Four matched groups were exposed in pairs 
under conditions (a) and (b), respectively, 
to a specially designed advertisement, one of 
each pair hearing the beginning placement and 
the others the interruption placement. A 
fairly elaborate administrative procedure was 
used to evoke “normal” reactions to the ad- 
vertisements. A control device was designed 
to detect differences in recall arising out of 
unplanned group differences in respect of per- 
sonal characteristics and testing conditions. 

Results showed that normal reaction to the 
advertisement in the interruption placement 
interfered with perception much more than 
did normal reaction to it in the beginning 
placement. The difference was, in fact, such 
that the very significant advantage of the in- 
terruption advertisement under learning set 
conditions of exposure was reversed, and very 
significantly so, when normal reactions took 
place. 


Received November 24, 1952. 
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Check-Reading as a Function of Pointer Symmetry and Uniform 
Alignment ' 


Keith W. Johnsgard 
The State College of Washington 


During the past several years an extensive 
program of psychological research has been 
carried on dealing with human reactions to 
aircraft instrument panels. This program, 
undertaken primarily by the United States Air 
Force, is an attempt to simplify the increas- 
ingly complex task of reading aircraft instru- 
ments under flight conditions. 

Aircraft instruments serve for three basic 
types of reading: (1) check-reading for as- 
surance of a normal indication; (2) qualita- 
tive reading for the meaning of a deviation; 
and (3) quantitative reading for the actual 
numerical value of an indication (5). This 
paper is concerned with the first type and em- 
ploys the rotating pointer type indicator which 
has been shown to facilitate short latency re- 
sponses with a minimum of errors (3). 

A recent study has indicated that rectangu- 
lar arrangement of small engine instruments 
on multi-engine aircraft and the use of rotata- 
ble dials, making possible uniform pointer 
alignment under flight conditions, will provide 
a significant advantage in speed and accuracy 
of check-reading (6). However, another proj- 
ect concerning recognition span made it ap- 
parent that check-reading is facilitated by 
pointer symmetry even when the pointers are 
not all in the same standard position such as 
9 o'clock (8). Both uniform pointer align- 
ment and pointer symmetry are superior to 
mixed alignment for check-reading. The 
development of rotatable dials makes these 
principles applicable to engine instrument 
panels. With this arrangement dials could be 
fixed in such a manner that the pointers would 
form any pattern that would facilitate rapid 
and accurate check-reading wherein any de- 
viation could be quickly identified in terms 
of direction, engine, and function. 

1Submitted in partial fulfillment of the require- 
ments for the M.S. degree in the Department of Psy- 
chology in the Graduate School of the University of 
North Dakota. The author wishes to thank Dr. 


Hermann F. Buegel under whose direction the re- 
search was conducted. 
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A basic problem then is a consideration of 
which type of pointer-pattern would facilitate 
the most efficient check-reading. It is this 
problem with which this paper is concerned. 


Method 


Stimulus Preparation. Four sixteen-dial panels 
containing different pointer patterns were used 
and will be referred to as configurations through- 
out the remainder of this report. A null hy- 
pothesis was stated that the four configurations 
would equally facilitate check-reading. The con- 
figurations are shown with pointers in a null po- 
sition in Figure 1. 

Nineteen stimulus panels were prepared for 
each configuration. One panel showed the point- 
ers in a null position with the remaining eighteen 
panels for a particular configuration containing 
dials in which pointers were deviating from null 
position. These eighteen panels were split into 
six blocks of three panels each. Panels of the 
first block each contained one deviating pointer, 
panels of the second containing two, and so on 
with each panel of the sixth block containing six 
deviating pointers. A complete set of eighteen 
test panels for any one configuration contained a 
total of 63 deviating pointers. The dial or dials 


9009 DOOM 
OO809 OOOO 
OOOO QOOW 
O009 OOOO 


CONFIGURATION 1. CONFIGURATION 3. 


GOGO GOO 
GOC9 VOOM 
COCO QOQO 
GOGO QOOO 


CONFIGURATION 2. CONFIGURATION 4, 


Fic. 1. The four configurations with pointers in a 
normal or correct position. 
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within a particular panel that were to contain 
deviating pointers were chosen in a random man- 
ner (4). The position of the deviating pointer 
within an error dial was chosen in the same way 
requiring a possible discrimination ranging from 
a maximum of 180 degrees to a minimum of 15 
degrees from the null pointer posit’oa. 

Stimulus material was prepared from 35 mm. 
negative film with projected dial borders and 
pointers appearing white on a dark background 
when flashed on a screen (2, 11). The diameter 
of a projected dial was three inches (7, 9, 10, 
13). Pointer width was approximately 3/32 inch 
extending from the center of the dial to the 
border (1). 

Physical Conditions. A modification of the 
Whipple pendulum type tachistoscope was used. 
The noiseless mechanism is described and pic- 
tured in a recent publication (11). Stimulus ma- 
terials were contained in a slide projector behind 
the tachistoscope. The projector was 34 inches 
from the floor with the light beam projected 
horizontally to a screen seven feet away. The 
illuminated screen when using transparent slides 
was approximately eight foot-lamberts (2). 

Two S’s were tested simultaneously, and were 
seated on each side of the projected beam in two 
single-arm writing desks pointed directly at the 
center of the projected panel. The distance from 
the projection area to the S’s eyes was approxi- 
mately 50 inches (1). 

The testing room was darkened to maintain ef- 
fective contrast (14). Directly over the subjects 
a soft beam of light was directed downward to 
facilitate writing responses on score sheets. The 
beam did not affect screen brightness. 

An exposure time of a half second was allowed. 
This is the average fixation time for pilots en- 
gaged in instrument flying (8). 

The Sample. The sample population consisted 
of 48 male students enrolled at the University of 
North Dakota. The ages ranged from 18 through 
37, with all but 7 of the S’s between the ages of 
18 through 26. Visual acuity was checked before 
experimentation with a Snellen Eye Chart. A 
criterion of 20/25 was set as a minimum of visual 
acuity. Subjects with sight corrected to this 
criterion by glasses were considered satisfactory 
for experimental purposes. None of the S’s had 
— any tachistoscopic training of any 
kind. 

Test Procedure. After being checked for eye- 
sight, the two S’s were seated, acquainted with 
response sheets, and instructed. The response 
sheets contained 72 sixteen-dial panels numbered 
in the same manner as the test panels. Before 
being presented with a set of 18 test panels the 
S’s were allowed to study the projected normal 
configuration panel with all needles in a null po- 
sition. This normal panel was then shown three 
times with a half second exposure. It was ex- 
plained that on the test panels to follow not all 
of the dials contained pointers in the normal po- 
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sition. S’s were instructed to check the appro- 
priate dial or dials on the response sheets that 
corresponded to those on the test panel contain- 
ing deviating pointers. Presentation of the 18 
test panels for the appropriate configuration fol- 
lowed. Before an individual panel presentation 
the S was informed of the panel number and was 
given a ready signal. Approximately 1 second 
later the exposure occurred. After each exposure 
as much time as was needed was allowed for re- 
‘sponding. 

Following observation of two sets of test panels 
a short rest was allowed. The entire test period 
varied from 35 to 45 minutes depending on speed 
of response. 

In an attempt to eliminate practice effect from 
the total group results the order in which the 
four tests were administered was varied. One 
group of 12 S’s observed the configurations in 
numerical order. A second group began with 
configuration 2, a third group with configuration 
3, and a fourth with configuration 4. S’s for each 
group were selected at random. 

Method of Scoring. One point was allowed for 
each dial correctly identified as containing a 
deviating pointer. Four total configuration scores 
were computed for each paper. Each of these 
total scores was the sum of six sub-scores. The 
sub-scores were the correct responses made to 
each of the six sets of three panels that contained 
from one to six deviating pointers. It was not 
considered necessary to penalize incorrect re- 
sponses. The method was arbitrary. 


Results 


Means of total correct responses in locating 
error dials in the entire set of 18 slides for 
each configuration are listed in Table 1. To 
test the null hypothesis that the configura- 
tions were of equal difficulty, a small sample 
t test for correlated means was computed. 
The results of this test are indicated in Table 
2. The test showed configuration 3 to be 
significantly superior to the others tested for 
check-reading. The stated null hypothesis 
might safely be rejected. Differences between 


Table 1 


Means, Standard Deviations, and Standard Errors 
for Total Correct Responses 


Configu- 
ration 


Mean 
30.19 
31.46 
34.48 
17.54 
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Table 2 


Confidence Levels and t-Scores Between Total 
Correct Response Means 


Configuration 
Means Compared t 
1.61 
4.94 
12.82 
3.92 
13.53 
18.35 


Confidence 
Level 

120 
Beyond .001 
Beyond .001 
Beyond .001 
Beyond .001 
Beyond .001 


the standard deviations proved to be insig- 
nificant. 

The performance curves for the 48 S’s on 
each configuration are shown in Figure 2. 
The curves are best fit by a method of least 
squares and in all cases the fits were very rea- 
sonable. It should be stated that a definite 
restriction exists for the interpretation of these 
data. The data are plotted as mean correct 
responses as a function of blocks of three 
trials. However, an experimental maximum is 
imposed by design, since only one error dial 


= = ind 
°o °o 


MEAN CORRECT RESPONSES 
w 
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is contained in each panel of the first block, 
two in the second block, and so on with six in 
the last lock. This factor could have in- 
fluenced the first two blocks of trials but per- 
haps affected no further blocks. With this 
restriction in mind, the data were plotted, 
as curve shape was considered to be important 
with regard to further practice. Examina- 
tion of the formula for configuration 2 is sig- 
nificant in that the suggested asymptote is 
3.14, while those for configurations 1 and 3 
are 2.26 and 2.64, respectively. There exists 
the possibility that with further practice, con- 
figuration 2 might prove to be more useful 
than any of those tested. 

It will be recalled that the order of con- 
figuration presentation was varied in order to 
compensate for possible over-all practice effect. 
With regard to this an analysis of total con- 
figuration scores was made. The mean score 
for the 48 Ss on the first configuration pre- 
sented was 27.42, with the mean score for the 
last configuration presented being 29.56. It 
is evident that some transfer exists between 
configurations. 


C,=(R=2.26-2.896e--256! T) 
Co7(R=3.14-2.67¢--0885T) 


C3=(R=2.64-268e--2058T) 
Cq=(R=.19 T*.31) 


2 3 
TRIALS 


Fic. 2. Mean error dials located in each of the four configurations with each 
abscissa point representing the mean values of blocks of three trials. 
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Grether and Warrick (6) report that about 
twice as much time was required to check- 
read a sixteen-dial panel employing four sub- 
groups than to check-read a panel of equal 
dial number with pointers in uniform align- 
ment at the nine o’clock position. This study 
tends to reinforce that finding in that about 
twice as many error dials were found in con- 
figuration 1 with uniform alignment at nine 
o’clock as in configuration 4 which employed 
the four sub-groups. These same investiga- 
tors have shown that the nine o’clock po- 
sition is the most favorable pointer position 
of uniform alignment. This experiment has 
indicated that panels employing pointer sym- 
metry are equally as good as the most favor- 
able position of uniform alignment for check- 
reading. The results further suggest with 
reasonable assurance, that one of the con- 
figurations (C3) with pointer symmetry is su- 
perior to uniform alignment (C1) and that 
another (C2) might prove superior with prac- 
tice. 

The findings of this experiment would in- 
dicate that rotatable dials and a sixteen-dial 
rectangular panel would facilitate check-read- 


ing of aircraft engine instruments. It is likely 
that these principles can be applied in most 
situations where a multi-engine arrangement 
exists such as industry where rapid accurate 
check-reading is a necessity. 


Summary 


A tachistoscopic study in which simulated 
instrument dials were observed at short ex- 
posure was performed to determine efficiency 
in locating deviating dial pointers in four in- 
strument panels employing the principles of 
uniform alignment, pointer symmetry, and 
sub-grouping of pointer pattern. A null hy- 
pothesis was stated that the four patterns 
would equally well facilitate check-reading. 
S’s totalled 48 naive male students. 

The results of the experimentation allow a 
statement of the following tentative conclu- 
sions: 

1. Configurations in this experiment em- 
ploying pointer symmetry facilitate check- 
reading equally as well as do panels with uni- 
form alignment. There is reasonable evidence 
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that one of the former type is superior to the 
latter in terms of number of correct responses, 
and that another might prove superior with 
practice. The null hypothesis was rejected. 

2. Panels employing pointer symmetry and 
uniform alignment are superior to sub-groups 
for check-reading. 

3. Check-reading improves with a relatively 
short amount of practice and some transfer 
exists between panels with differing pointer 
positions. 

4. It was suggested that the use of a rec- 
tangular sixteen-dial panel of aircraft engine 
instruments with rotatable dials would facili- 
tate rapid check-reading, and that these prin- 
ciples might profitably be applied in other 
situations were multi-engine panels are used. 


Received January 12, 1953. 
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A systematic investigation of performance 
in visual tasks as a function of low photopic 
brightness levels is essential to expand our 
knowledge of adequate visual performance 
levels. Although there have been many stud- 
ies under higher brightness levels (above 1 
foot-lambert) the range between cone thresh- 
old and 1 foot-lambert has been relatively 
neglected. Stimulated by the needs of the 
last world war, interest in this region has been 
increasing and the time seems now appropriate 
for a systematic summary of information in 
this area. When new experiments are added 
to fill the gaps, this should give us a more 
nearly complete theoretical and _ practical 
knowledge of the problem. Senders’ (27) 


summary and Rock’s (25) annotated bibli- 
ography (sections B and C) concerning stud- 
ies of visual acuity are comprehensive and 
should be consulted for a background in this 


general area. In this discussion visual acuity 
per se will not be considered a measure of 
performance and in most studies reported was 
a controlled variable. 

A number of variables have been shown to 
be of importance in the investigation of visual 
performance. For example, defects of both 
the visual mechanism and the stimulus objects 
must be controlled. Ferree and Rand (11, 
12) and Sheard (28) have indicated that 
ocular defects, presbyopia in their cases, re- 
quire increased illumination for adequate per- 
formance. Tinker’s (37) study on illegible 
print also indicates a need for higher bright- 


* This report is a condensation and partial revision 
of a doctoral dissertation, the original of which is on 
file in the library of the University of Rochester. 
The author is indebted to S. D. S. Spragg, Univer- 
sity of Rochester, for his direction and guidance. 

The experiments reported here were conducted as 
part of a program of research on human factors re- 
lated to aircraft instrument lighting carried out on 
a research contract (W33-038 ac18317) between the 
University of Rochester and the Air Materiel Com- 
mand, U. S. Air Forces. They have been reported in 
the following technical reports to the Aero Medical 
Laboratory of the Air Materiel Comand: MCREXD- 
694-21 and TR 6040. 


ness levels required with this defective stimu- 
lus object. Tinker (36, 38) has shown that 
adaptation level of the eye has an enormous 
effect on performance and also on subjects’ 
brightness level preferences. Investigations 
of the effect on visual performance of the 
quality of light have given conflicting results. 
Ferguson and McKellar (9), investigating 
binocular visual observations of a landolt 
ring at brightness levels below 1 foot-lambert, 
found that at 0.5 foot-lamberts the best per- 
formance was with red light, then amber, 
white, blue and the poorest, green. Many 
other studies such as that of Craik (7), who 
investigated legibility of different colored in- 
strument markings at low illumination, have 
found green and blue to be inferior. Hartline 
(15) investigated the relative merits of lights 
of different wave-lengths in the airplane cock- 
pit situation and found the measure of in- 
dividual thresholds to be a good index of 
visual function at low intensity levels. Mc- 
Farland (22) recommends that for cockpit 
use at night, no wave-length below 620 mm. 
(red-orange) be used. Some studies have re- 
porte’ on the effectiveness of performance 
under various wave-lengths of light. Spragg 
and Rock (30) investigating performance of 
reading airplane dials under four different 
wave-lengths and at two illumination levels 
(.01 and 0.1 foot-lamberts) found that within 
the range of colors and brightness studied dial 
reading performance showed no consistent re- 
lationship to the wave-length composition of 
illuminant. These results have been verified 
with performance of flying a link trainer 
under these same conditions. 

The types of performance criterion which 
investigators have used fall into three main 
classes: (a) speed of response, (b) accuracy 
of response, (c) physiological correlates. 
Cobb (5, 6) has investigated speed of vision 
as a function of illumination level. He em- 
ployed various patterns associated with con- 
fusion patterns as stimulus objects and found 
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that a logarithmic relationship held for paral- 
lel bars between 1 and 100 foot-candles, but 
under more complicated conditions the rela- 
tionship breaks down so that the expected 
gain in sensitivity due to increased intensity is 
not realized. Ferree and Rand (10) investi- 
gating speed of vision as a function of bright- 
ness with special reference to industrial 
situations found that on work of a factory 
type, involving important use of the eyes, 
speed of vision increased as brightness in- 
creased up to a maximum. Many studies of 
reading performance have used speed as a 
criterion; Tinker (32, 33, 35). 

Accuracy, usually in conjunction with 
speed, has been used extensively in practical 
situations as a measure of performance. 
Typical studies are those of White, Britten, 
Ives, and Thompson’s (41) study of ocular 
efficiency and fatigue among letter separators 
as a function of brightness level, and Weston 
and Taylor’s (40) investigation of fine type- 
setting done by hand as a function of bright- 
ness level. Most of the threshold studies, 
however, are accuracy studies which do not 
involve speed of reaction, e.g., Hartline (15), 
Brown (3), Brown and Mize (4), Graham 
and Hunter (14), etc. 

Luckiesh and Moss (17, 18, 19) employed 
such physiological correlates as blink rate, 
heart and pulse rate, metabolic ratios, etc., 
as performance criteria for readability. Mc- 
Farland, Knehr and Berens (23) found that 
metabolic ratio and pulse rate are inadequate 
criteria for reading. Tinker’s (34) numerous 
experiments throw doubt on the use of blink 
rate as a criterion. 

From his consideration of the existing liter- 
ature, the writer believes that in future ex- 
periments on visual performance: (a) visually 
screened subjects should be used so that they 
fall into a “normal” category; (b) the stimu- 
lus objects should be legible and above the 
resolution threshold of the eye at all bright- 
nesses tested; (c) light quality should be 
controlled and specified; (d) light quantity 
should preferably be designated as_ bright- 
ness; and (e) performance criteria should be 
speed and/or accuracy with the possible use 
of certain physiological correlates in the case 
of fatigue studies. 

In consideration of the above, and with the 
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aim of contributing experimental data to fill 
the gaps in our knowledge of visual perform- 
ance as a function of low brightness levels, 
four representative visual tasks were chosen 
for investigation: (1) judgment of magnitude 
of an illusion; (2) motion threshold; (3) 
depth perception; and (4) a simple addition 
task. Each task was investigated under five 
brightness levels in the crucial range of .005 
foot-lamberts (which is just slightly above the 
values usually stated as cone threshold, i.e., 
.002 to .004 F.-L.) to 1.00 foot-lambert. 


Experiment I. Magnitude of the 
Miiller-Lyer Effect 


This first experiment is a study relating 
performance in the judgments of equality of 
the two lines of the Miiller-Lyer figure to 
various low photopic levels of brightness. 
The Miiller-Lyer, one of the best known visual 
geometric illusions, has many _ variations. 
The basic example is a figure consisting of 
two straight parallel lines of equal length. 
Each line is terminated at each end by two 
short oblique lines forming an angle whose 
apex is at the end of the major line. On one 
line the oblique lines extend back toward 
the center of the major line, on the other they 
extend away. The illusion, which is a po- 
tent one, consists in perceiving the latter line 
as longer than the former. 

This illusion was selected as one of the 
visual tasks to be investigated here because it 
is representative of a general class of visual 
illusions and hence is a rather important 
visual perceptual task of judgment. 

Many variables affect the judgment of 
visual illusions. Our past experience, associa- 
tions, demands, desires, and more or less ob- 
scure influences may create illusions. The 
physical characteristics of the stimulus object 
are also of paramount importance. The loca- 
tion of the object in the visual field, the struc- 
ture of the field, equivocal figures, the influ- 
ence of angles, color, irradiation, and bright- 
ness contrast and lighting and shadows are 
just a few of the main variables causing illu- 
sions. 

Our present interest is to investigate the 
relation of the magnitude of effect of an il- 
lusory figure to low photopic brightness levels. 
We want to answer the following two ques- 
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tions: Do geometric illusions increase in effect 
under low brightness levels as compared with 
ordinary illumination? Is there a critical 
value of illumination above which the mag- 
nitude of the illusion does not vary with 
brightness? 


Method 


Apparatus. The general plan of the ap- 
paratus followed that employed by Spragg 
and Rock (29) in their studies of dial read- 
ing performance as related to low photopic 
illumination levels. 


The subject was seated in a three-sided booth, 
approximately 4 X 4 feet, facing the middle wall. 
The entire visual field was painted a matte black. 
Placed in the 14 X 11 inch aperture of the front 
wall was the Miiller-Lyer figure. The center of 
the figure was 28 inches from the subject’s eyes 
and 15° below his horizontal line of regard. An 
adjustable head rest, mounted on a _ horizontal 
bar, served to keep the subject's head in a satis- 
factorily constant and comfortable position. 

The stimulus object was a Miiller-Lyer figure, 
with a 7.5 cm. stationary standard arrow-headed 
part. On the subject’s right side was a sliding 
board on which there was a line with an arrow 
feather at one end. This side of the figure could 
be moved under the standard until its line was 
the desired length. The lines were 2 mm. in 
width, white on a black background. The ob- 
liques were 3 cm. and at a 27° angle. The vari- 
able stimulus was moved by the experimenter by 
means of a rack and pinion which could be varied 
by equal steps in a smooth manner. 

The light sources were two 60 w. Mazda lamps 
in cans fitted with filters and aperture holders. 
An assembly consisted of a ground-glass square 
of heat-resistant glass, and a brass plate with a 
circular aperture drilled in the center. Voltage 
was maintained at a constant level by means of 
a variac, Model V-5MT, and a monitoring Wes- 
ton A.C. Voltmeter, Model 433. The color tem- 
perature was in the neighborhood of 2400° K. 
Chosen levels of illumination were achieved by 
means of accurately drilled apertures in remov- 
able brass plates. All light sources had two 
ground-glass surfaces in the optical pathway to 
achieve high dispersion. 

Data sheets were prepared in advance. These 
indicated the five levels of illumination with 
columns under each for recording the subject’s 
responses, and a section for the subject’s com- 
ments. 

Subjects. Ten male subjects served as the ex- 
perimental population. All were students at the 
University of Rochester (three graduates and 
seven undergraduates) and were all in their late 
teens or twenties in age. Subjects chosen were 
those who passed a rigorous visual screening, 
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using the Keystone Telebinocular. All subjects 
had: normal ophthalmoscopy, 20/20 visual acuity, 
monocularly and binocularly, at distance and 
near, without glasses, 80 per cent or better 
stereopsis, no vertical imbalance, less than 6 
prism diopters physiological exophoria; less than 
2 prism diopters of exophoria or esophoria at 
distance; and normal color vision. 

Procedure. Each subject was allowed to be- 
come cone dark adapted (appioximately ten 
minutes) before the illumination was turned on. 
With the stimulus object illuminated with .05 
foot-lambert of brightness, the subject was given 
the instructions: 

This is an experiment to determine the influ- 
ence of varying brightnesses of illumination on 
the perception of the length of a line complicated 
so as to present an illusion. Yes, this is a very 
common illusion, but I want you to tell me when 
this line (pointing to the variable) looks to you 
to be equal to this stationary line. (With the 
variable much larger than the standard.) Now, 
the line is much larger than the standard. You 
are to say “Now” when the lines appear to you 
to be equal in length. (Decrease the size of the 
variable stimulus so that it is much smaller than 
the standard.) Now the line is smaller than th: 
standard. Say “Now” when it appears to you 
that the lines are equal. Five practice trials we-e 
given. 

On the formal trials each subject was first pre- 
sented with the variable stimulus larger than the 
standard. The stimulus was decreased by the ex- 
perimenter in successive steps of 4% mm. until 
the subject reported equality. Then the experi- 
menter presented to the subject a stimulus smaller 
than the standard, and the stimulus was altered 
by successive increments until the subject re- 
ported he no longer perceived any difference 
This is a modified method of limits called the 
method of equivalents, in which only points of 
equality are recorded, approached from the two 
possible directions. Ten responses at each illumi- 
nation level for each subject were required, five 
ascending and five descending. It is realized that 
space errors are present in this method, but as 
we are interested in differences between perform- 
ance at the various levels of brightness and since 
these errors are presumably constant, they should 
not bias the results. 

The levels of illumination were chosen to en- 
compass the critical range of low photopic bright- 
ness as found in the experiment of Spragg and 
Rock (29). The levels range from just above 
cone threshold to 1 foot-lambert. The values 
used were: .005, .01, .05, .1, and 1 foot-lamberts. 

Brightness measurements were made with a 
Macbeth Illuminometer used in the subject’s po- 
sition and directed against a white square painted 
with the same paint as that of the stimulus ob- 
ject. 

Since five levels of illumination were used it 
was necessary to employ balanced sequences of 
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brightness levels to control possible practice and 
fatigue effects. In changing from one level to 
another the subjects were given from five to ten 
minutes for adaptation. 

Each subject was given a visual screening test 
on one day, and the entire series of judgments 
on another day. 


Results 


The data of this experiment consist of judg- 
ments of equality of length of lines in the 
Miiller-Lyer figure for ten subjects under five 
different brightness levels. 

The mean errors of judgment are presented 
in Table 1 which shows for each subject thee 
mean error, sigma, standard error, and per 
cent error of standard at each of the five 
brightness levels. 

Inspection of Table 1 and Figure 1 shows 
that: (a) the effect of the illusion at the three 
higher brightness levels is considerably less 
than at the two lower levels; (b) above 0.05 
foot-lamberts increasing brightness produces 
no significant improvement in performance; 
while (c) below 0.05 foot-lamberts decreas- 
ing brightness is clearly associated with poorer 
performance on this task. 

Since our principal concern is with per- 
formance as a function of brightness, a ¢ 


Table 1 


Showing the Magnitude of Errors in mm. in Judgments 
of Equality of the Miiller-Lyer Figure 
at Five Brightness Levels 





Brightness in Foot-Lamberts 


Subjects 0.005 0.01 


0.05 


2.4 1.7 
2.9 2.4 
1.9 1.5 
3.7 3.8 
1.9 1.5 
2.7 2.7 
3.0 2.4 1.0 
2.2 2.2 1.3 
1.5 1.3 d : a 
2.1 1.5 1.3 1.1 


1.1 
2.3 


24.3 
2.43 


Sum 
Mean 


21.0 
2.10 


14.2 
1.42 


13.3 
1.33 


13.8 
1.38 


o 0.61 0.73 0.45 0.71 0.63 
SE 0.20 0.24 0.15 0.24 0.21 
%of{Stand. 32.5% 28.0% 18.9% 17.7% 184% 
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Visual performance as a function of low 
photopic brightness levels. 


analysis was carried out comparing group 
performance for each pair of brightness levels. 
A summary of this analysis is presented in 
Table 2. From Table 2 it is seen that all 
differences between brightness levels below 
.01 foot-lamberts and those above .01 foot- 
lamberts are significant at the 1 per cent level. 

On the basis of the data presented above, 
it seems clear that: (1) there is a larger error 
in perception of length as tested in the Miiller- 
Lyer figure below the region of .05 fooi- 
lamberts; and (2) little or no increase in 
performance results from increasing bright- 
ness above this level. 

Practice Effects. It will be recalled that 
each subject was given five practice trials be- 
fore formal trials were begun, in order to re- 
duce practice éffects. The adaptation periods 
between levels would tend also to reduce 
practice effects. In order to determine 
whether practice effects were playing a sig- 
nificant role in this situation the errors were 
tabulated for each subject in terms of first 
brightness level tested, second level tested, 
etc. Since each brightness level appeared in 
each ordinal position the same number of 
times, no advantage due to sequence is pres- 
ent for any brightness level. 

The ¢ tests of the several differences in- 
dicate that there is no evidence of a practice 
effect. Early experimenters with illusions 
noticed that continued experience with one 
certain figure diminished the amount of the 
illusion. Heymans (42) and _ particularly 
Judd (42) made a systematic study of the 
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Table 2 


Vaiues of t, Comparing Mean Magnitude of Errors in 
Judgments of Equality of the Miiller-Lyer 
Figure at Five Brightness Levels 


Brightness in Foot-Lamberts 


005 


01 05 10 1.00 


3.62** — — 
7.06**  6.42** — 
7.05** 7.06** = 1.34 
1AS** 632° 0.80 


0.77 


** Significant at 1 per cent level. 


practice effect and found that the illusion 
gradually diminished and approached zero. 
The practice effect held good only for the 
original position of the figure. Reversal of 
figures returned the illusion to full strength. 
The illusion was revived even in the original 
figure by standing off and looking at it casu- 
ally as a whole. In our experimental pro- 
cedure the effect of practice was reduced satis- 
factorily by the use of units of experimenta- 
tion containing small number of trials sepa- 
rated by adaptation periods. 

Although the experimental design was such 
as to minimize the effects of errors of habitua- 
tion and/or expectancy (by giving ascending 
and descending trials alternately and changing 
the length of the trials) the analysis of the 
ascending and descending series at each 
brightness level shows the ascending series 
to be greater in all cases than the descending 
series. The differeneces were: .58 mm. at 
.005 foot-lamberts, .74 mm. at .01 foot-lam- 
berts, .96 mm. at .05 foot-lamberts, 1.06 mm. 
at .10 foot-lamberts, and .92 mm. at 1.00 foot- 
lamberts. These are errors of habituation. 

All subjects commented that judgments 
were more difficult to make under the two 
lower brightnesses but they all reported that 
they thought they did as well under the lower 
brightnesses as under the higher levels. 


Discussion 


The results reported above indicate for this 
task a critical level of brightness (about .05 
lamberts) below which the magnitude of per- 
ceptual errors increases significantly from the 
errors at and above this brightness level. 


Above this critical level further increases in 
brightness up to 1 foot-lambert (and possibly 
indefinitely), produce no significant incre- 
ments of performance. It would seem as 
though once a subject has been given suf- 
ficient brightness to perform the task with 
ease, brightness is no longer a significant 
variable. 


Absolute Motion Threshold 


One of the first questions to be raised with 
reference to the amounts of the motion that 
are either just perceptible or just not per- 
ceptible, is that of the so-called threshold 
value. In this study only the lower absolute 
threshold is to be considered. It is found 
that when velocity of a stimulus is diminished, 
there is a critical level of velocity beneath 
which no motion is perceived. Perception of 
motion not only depends on: (1) the physical 
velocity of the moving stimulus, but also on 
such variables as (2) form and size of the 
stimulus; (3) presence or absence of fixed 
reference objects, and their nature; (4) ab- 
solute and relative brightness of the stimulus 
and the background; (5) absolute and rela- 
tive color of stimulus and background; (6) 
light or dark adaptation of the eye; (7) mo- 
nocular and binocular observation; (8) macu- 
lar or peripheral observation of the stimulus; 
(9) distance of observation; (10) duration 
of the observation period; (11) eyes allowed 
to move or required to fixate; and (12) char- 
acteristics of the path of movement. Under 
usual operational conditions the observation 
is binocular with macular and/or peripheral 
fixation, non-limited duration of observation 
period, and the eyes moving in normal manner. 

In the present experiment the variables 
were treated in the following manner: The 
relation of physical velocity of the moving 
stimulus to the absolute and relative bright- 
ness of the stimulus and background was 
measured; absolute and relative color of the 
stimulus and background, size and form of 
stimulus, adaptation of the eye, distance of 
the observation, and characteristics of the 
path of movement were all controlled. 


Method 


Apparatus. The general plan of the experi- 
mental situation has been described in the preced- 
ing study. 


Experiment II. 
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The subject was seated in the three-sided booth 
facing the front wall. In the front wall was a 
2 X 4 inch aperture over which was superimposed 
a double gradient neutral density filter with the 
center clear and increasing in density toward the 
ends. The stimulus grid was presented in this 
aperture and the double gradient neutral density 
filter acted to gradually blur the edges so that 
no sharp reference boundary was evident in the 
field. 

The stimulus grid consisted of high-contrast 
photographic reproductions of 2 mm. wide alter- 
nate black and white lines. The grid was a con- 
tinuous circular band, 2 inches in height and 18 
inches in circumference. It was carried on three 
rollers placed so as to form a triangle 714 inches 
on the aperture side, 514 inches and 4% inches 
on the other two sides. The rollers were 34 inch 
rubber covered dowls. The roller not in the 
aperture was driven by a 1/60 horsepower Gen- 
eral Electric A-C motor, model SKH13E 19, 
type K.J., with a friction clutch attachment in 
conjunction with a 1/100 reduction gear. The 
actual velocity of the driven shaft was recorded 
in R.P.M. by a tachometer. The velocity of the 
driven shaft could be changed continuously and 
smoothly through a range of .05 mm. per sec. 
to 2 mm. per sec. A masking motor was em- 
ployed in conjunciion with the apparatus so as 
to mask any possible noise cues. 

The experimenter was seated at a small work 
table placed against the outside of the middle 
wall of the booth. The velocity regulating knob 
was within easy reach and the dial face of the 
tachometer was directly in front of him. On the 
table before him were located the motion appa- 
ratus described above, a Variac and a voltmeter 
for control of the subject’s lights, a carefully 
hooded lamp to provide minimal illumination 
and a data sheet to record velocities and subject 
remarks. 

The light sources and controls were as in the 
previous experiment. 

Data sheets were prepared in advance as in the 
previous experiment. 

Subjects. The same ten subjects who served 
in the preceding study were used in the present 
experiment. A month or more elapsed between 
their participation in the two experiments. 

Procedure. Each subject was allowed to be- 
come cone dark adapted before the illumination 
was turned on. With .05 foot-lamberts of bright- 
ness and the stimulus velocity very low (sub- 
liminal) the subject was instructed: 


This is an experiment to determine the influ- 
ence of varying brightnesses of illumination on 
the moving of strips. You are to look at the 
center of the screen, look at three or four 
strips and say “Now” when you see these 
strips move in a regular fashion across the 
screen. (Increase the velocity well above 
threshold.) Now the strips are moving across 


Table 3 


Showing the Mean Tachometer Readings at the Point 
of Absolute Motion Threshold and Mean 
Values for Minutes of Arc/Second 
at Five Brightness Levels 


Brightness in Foot-Lamberts 
Al 1.00 10.00 
i64 is 8 

ae | 6 ma 

14 14 6 

15 14 

13 13 

14 14 

14 14 

e 1.2 a 

9 ; is i23 

10 ‘ 1.0 1.0 


005 O01 05 


Subject 


onoauw fF wWhwe 


Mean Tach. 
Read. (RPM) 


130 1.25 
Mean, in Min. 
of Arc/Sec. ‘ MM 4 17 


oa Tach. 
SE Tach. 


0.17 
0.06 


0.10 
0.03 


0.15 
0.05 


0.13 
0.04 


the screen in a regular fashion; say “Now” 

when you can’t see them moving in this regular 

fashion. 
Results 

The data of this experiment consist of 
judgments of the presence or absence of mo- 
tion (absolute motion perception thresholds) 
for ten subjects under five different illumina- 
tion levels. 

The means of ten judgments for each sub- 
ject are presented in Table 3 which shows for 
each subject not only the mean velocity in 
R.P.M. but also in minutes of arc per sec. at 
each of the five brightness levels. 

Inspection of Table 3 and Figure 2 sug- 
gests that: (1) the absolute motion percep- 
tion threshold is markedly lower at the higher 
brightness levels; (2) there is a sharp change 
in motion perception performance between 
.OS and .1 foot-lamberts; and (3) there is 
relatively little improvement in performance 
above 0.05 foot-lamberts. 

Since our principal concern is with per- 
formance as a function of brightness level, a 
t analysis was carried out comparing group 
performance for each pair of brightness levels. 





Milton L. Rock 


ABSOLUTE MOTION THRESHOLD as 
A FUNCTION OF BRIGHTNESS 
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A summary of this analysis is presented in 
Table 4. From Table 4 it is seen that all 
differences which cross the .05 foot-lambert 
level are significant at the 1 per cent level 
while no difference that does not cross this 
value is significant at the 1 per cent level. 
Three other differences are significant at the 
5 per cent level: between .005 and .01; .005 
and .0S; and .1 and_1.0. A supplementary 
test was made on three of the subjects at 10 
foot-lamberts; Table 3 shows that there is 
no difference between the means at this level 
and at 1 foot-lambert. 

On the basis of the data presented above 
it seems clear that: (1) motion perception 
performance increases sharply in the region 
of .05 foot-lamberts; and (2) relatively little 
increase in absolute motion threshold results 
from increasing brightness above this level. 

Practice Effects. The method employed in 
the investigation of practice effects was the 
same as in Experiment I. Each brightness 
level appeared in each ordinal position an 
equal number of times. A ¢ analysis of the 
several differences indicated no significant 
practice effects in this experiment. The 
analysis of the ascending and descending 
series at each brightness level shows the 
ascending series to be greater in all cases 
than the descending series; these are errors 
of habituation. These errors of habituation 
prove to be small and relatively constant for 
all brightness levels. The differences were: 
.04 R.P.M. at .005 foot-lamberts, .12 R.P.M. 
at .01 foot-lamberts, .18 R.P.M. at .05 foot- 


lamberts, .10 R.P.M. at .1 foot-lamberts, 
and .08 R.P.M. at 1.00 foot-lamberts. 

The subjects’ comments are of interest in 
this experimental situation. Subjective re- 
ports of pulsations in a plane perpendicular 
to the movement and pulsations in the plane 
of the movement were frequent. These pulsa- 
tions developed before motion was apparent, 
but in decreasing supraliminal motion to no 
motion the pulsations were usually not re- 
ported. The subjects typically believed their 
performance to be about equally good at all 
levels of brightness tested. 


Discussion 


The results of absolute motion threshold 
perception reported above indicate a critical 
level of brightness, between .05 and .1 foot- 
lamberts, below which subjects’ absolute mo- 
tion threshold is significantly raised. Above 
this level the absolute motion threshold is a 
minimum and a further increase in bright- 
ness, at least up to 10 foot-lamberts and 
probably indefinitely, produces no further sig- 
nificant increments of performance. Again 
it seems as though once a subject has been 
given just enough brightness to perform this 
task with ease, brightness is no longer a sig- 
nificant variable. 

The absolute motion thresholds found in 
this experiment ranged from 24 secs. of arc 
per sec. at .005 foot-lamberts to 10 secs. of 
arc per sec. at 1 foot-lambert (and with three 
subjects at 10 foot-lamberts). These results 
are lower than those usually stated as typical. 
Generally one to two minutes of arc per sec. 
is the absolute motion threshold reported. 


Table 4 


Values of ¢, Comparing Mean Tachometer Reading at 
Point of Absolute Motion Threshold at 
Five Brightness Levels 


05 1 


005 , = 

1 — — - - 

05 2.69% 1.63 _- _ 

10 12.52** 12.94°* 10.82% — ; 
1.00 19.72** 14.17** 13.00°* 2.70% — 











* Significant at 5 per cent level. 
** Significant at 1 per cent level. 
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The source quoted is usually Aubert’s ex- 
periment, but his results should be qualified 
with, “under short fixation times,” for Aubert 
follows his results with: “. . . whereas with 
lower velocities it requires several seconds to 
detect motion” (16). With unlimited fixa- 
tion time Munch (39) reported thresholds 
as low as 34 secs. of arc per sec. and Basler 
(39) as low as 13 secs. of arc per sec. for 
foveal fixation under daylight conditions. 
This compares closely with the present re- 
sults. Thus, it would seem that under ordi- 
nary operational conditions with unlimited 
fixation time, brightness above .05 foot-lam- 
berts, binocular vision, cone adapted eyes, 
with a blurred but stationary reference in the 
visual field, and stimulus size subtending 10 
minutes of arc the absolute motion threshold 
is of the order of 10 sec. of arc per sec. Be- 
low a brightness of .05 foot-lamberts the mo- 
tion threshold increases rapidly up to 24 sec. 
of arc per sec. at .005 foot-lamberts. 


Experiment III. Depth Perception 


The two preceding experiments have pre- 
sented data concerning performance on a 
visual illusion and on a motion discrimina- 
tion task as a function of low photopic bright- 
ness levels. The present study is a further 
extension of these studies to performance on 
a depth perception task as a function of the 
same low photopic brightness levels. 

It is evident that it is impossible for the 
retinal image alone to give us a tridimensional 
perception, for the retinal image is only two 
dimensional. Stereopsis can best be consid- 
ered a unification of many visual impressions 
and among the factors utilized to a greater 
or lesser extent are: (1) size of retinal 
image, (2) aerial perspective, (3) miathe- 
matical perspective, (4) distribution of lights 
and shadows, (5) intervening objects, (6) 
convergence and accommodation, and (7) 
parallax. 

The most important factor contributing 
towards binocular stereopsis is the phenome- 
non of binocular parallax. It is the absence 
of this factor in monocular vision that renders 
the estimation of depth so difficult, especially 
with the head fixed. 


Method 


Apparatus. The experimental 
been described above. 

For this experiment the stimulus field con- 
tained three dull white rods, the two outer rods 
fixed and the center rod movable. The rods 
were separated by 34” and were 2 mm. in diam- 
eter. Approximately one inch in the center re- 
gion of the rods was visible to the subject. The 
background was a matte black. The movable 
rod was moved by means of a rack and pinion by 
the experimenter. Brightness was controlled as 
in the previous experiments. 

Subjects. The same ten subjects who served 
in the previous studies were used in the present 
study. Two weeks or more elapsed between their 
participation in this experiment and the preced- 
ing one. 

Procedures. Each subject was allowed to be- 
come cone dark adapted (approximately 10 min- 
utes) before the illumination was turned on. The 
instructions were as follows: 


situation has 


This is an experiment to determine the influ- 
ence of varying brightnesses of illumination on 
the performance of depth perception. You 
are to look at the three white rods. The cen- 
ter one will be moved back and forth; you are 
to tell me when you see it in the same plane 
as the other two. Now the center rod is back 
of the other two. Say “equal” when it ap- 
pears to be in the same plane as the two fixed 
rods; now tell me when it appears in front of 
the other rods. Now the rod is well in front 
of the other two rods. Say “equal” when it 
appears to you to be in the same plane as the 


Table 5 


Showing the Mean Constant Errors in Depth 
Perception in mm. Under Each of the 
Five Levels of Brightness 


Brightness in Foot-Lamberts 


Subjects d 01 05 a 


2.4 —, 
3.9 


1.00 


3.2 
1.2 
3.5 

2.1 

1.1 - 

1.7 9 
2.8 pe 
3.5 - J 
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0.20 
0.52 
0.17 


0.18 
0.47 
0.16 
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others. Now tell me when it appears behind 
the other two rods. 


Five ascending (toward observer) and five de- 
scending (away from observer) trials were given 
before the formal trials were begun; this fore-test 
served to reduce the practice effect. 

On the formal trials each subject was given five 
ascending and five descending trials at each of 
five brightness levels. The method used was the 
traditional method of limits. 

The levels of brightness were the same as the 
preceding studies. 


Results 


The data of this experiment consist of con- 
stant errors, and average errors (non-alge- 
braic), made by the group of ten subjects 
under the five levels of brightness employed. 

The constant error data are presented in 
Table 5, which shows for each subject the 
mean of 20 judgments at each brightness 
level. Mean values for the group are thus 
based on 200 judgments at each level. The 
mean values from Table 5 are shown in Fig- 
ure 3 as a function of log brightness in foot- 
lamberts. 

Inspection of Table 5 and of Figure 3 in- 
dicates that: (1) performance is markedly 
more accurate at the higher brightness levels; 
(2) there is a relatively sharp change in per- 
formance between .01 and .05 foot-lamberts; 
and (3) the mean constant errors are all posi- 
tive in direction, ie., at the judgment of 
equality the variable is in all cases in front 
of the two fixed rods. A ¢ analysis was carried 
out comparing group performance for each 
pair of brightness levels and a summary of 
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Table 6 


Values of t, Comparing Constant Errors in Depth 
Perception at Five Brightness Levels 





Brightness i.1 Foot-Lamberts 


005 1 05 1 1.00 





005 —_ 

O1 1.81 

05 lao 

l oan" 
1.00 7.76** 


6.88** 
8.14** 
8.93** 


1.33 
0.76 





** Significant at 1 per cent level. 


this analysis is presented in Table 6. From 
this table it is seen that all differences which 
cross the .01 foot-lambert level are significant 
at the 1 per cent level, while no difference that 
does not cross this value is significant at the 
5 per cent level. 

The data for average errors are presented 
in Table 7 and the ¢ analysis in Table 8. 
Figure 3 presents the results graphically. The 
results are on the whole the same for the two 
kinds of error analysis. The only noticeable 
difference is that the average errors continue 
to decrease above the critical brightness level 
while the constant error values change very 
little above this point. Although (see Table 
8) the differences between .05 and .1 and be- 


Table 7 
Showing the Mean Average Errors in Depth 
Perception in mm. Under Each of the 
Five Levels of Brightness 
Brightness in Foot-Lamberts 
005 05 
5 > 4.4 
4.8 ’ 2.9 
5.2 . 3.3 
3.4 
3.0 
3.6 
4.6 
5.9 
4.6 
10 5.6 


Subjects 


Mean 5. 4.1 
a eo 1.0 
SE 0.3 
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Table 8 
Values of ¢, Comparing Average Errors in Depth 
Perception at Five Brightness Levels 


Brightness in Foot-Lamberts 


005 O1 O05 1 


0.98 
4.53** 
657" 


5.44** 


ane 
12.50** 1.00 
‘iw sar" 


** Significant at 1 per cent level. 


tween .1 and 1.00 foot-lamberts are not sig- 
nificant, the difference between .05 and 1.00 
is significant at 1 per cent level. 

Analysis of the above data shows that: (1) 
Accuracy of depth perception performance 
decreases sharply below .05 foot-lamberts 
and (2) little increase in accuracy results 
frem increases in brightness above this level. 

The results of this experiment, expressed 
in terms of angle of binocular parallax, are 
shown in Table 9. : 

Practice Effects. Practice or “warm-up” 
effects were analyzed as in the other experi- 
ments. A ¢ test analysis of the several dif- 
ferences indicate no evidence of a practice 
effect for the experiment. 

The subjects’ comments were of interest 
in that seven of the ten subjects made some 
reference to the apparent decrease in tri- 
angularity. All these seven subjects stated 
in various ways that the center movable rod 
was equal when the apparent triangularity 
was zero and that the fixed rods were used as 
the base standard. : 


Discussion 


The results of the constant and variable 
error reported above clearly indicate that for 
depth judgments there is a critical level of 
brightness between .01 and .05 foot-lamberts. 
Depth perception is relatively difficult below 
this level. Above this value the task becomes 
suddenly easier and both constant and varia- 
ble errors decrease markedly. Further in- 
creases in brightness—at least up to 1.0 foot- 
lamberts and probably indefinitely—produce 
no further increments of performance. 

It is interesting to note that although Muel- 
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ler and Lloyd (20) state that stereoscopic 
acuity decreases in a regular fashion as in- 
tensity decreases, if one plots their results so 
that the actual data points are connected and 
the curve not smoothed a sharp break appears 
at approximately the same brightness level as 
in the present experiment. It is of interest 
also to note that the trend of their results 
would be highly similar to that of the present 
experiment, with performance leveling off 
above approximately .05 millilamberts. 

The absolute binocular parallax values 
found in this experiment for some subjects 
are lower than reported by the earlier experi- 
menters. Bourdon’s (2) 5 seconds was the 
lowest difference reported. Six of our ten 
subjects had five seconds or less at brightness 
level above the critical point of .05 foot- 
lamberts. The variability of these at the 
adequate performance brightness levels was 
great. 

These results add another item to our 
knowledge of critical visual performance levels 
as a function of low photopic brightness levels. 
Depth perception as well as dial reading per- 
formance, illusion errors and absolute motion 
thresholds appear to have approximately the 
same critical level of brightness, above which 
performance is adequate and an increase in 
brightness has relatively little effect and be- 
low which performance is relatively poor. 


Experiment IV. Addition Task 


A simple quantifiable mental task was in- 
vestigated to add to the picture of visual per- 
formance at low photopic brightness levels. 
Reading tasks are the first to come to mind, 
but two serious objections are inherent in the 
use of reading material. First, only time 


Table 9 


Showing the Mean Constant Errors and Average Errors 
as Angles of Binocular Parallax 


Brightness, in 


Constant Error Average Error 
Foot-Lamberts 


(Sec. of Arc) (Sec. of Arc) 
0.005 90 160 
0.01 155 
0.05 5 105 
0.1 ; 100 
1.0 4 90 
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scores can be made with any reliability, and 
second, at low photopic levels the material 
would have to be made abnormally large in 
order to avoid the factor of visual acuity as 
a limiting variable. Two of the most produc- 
tive investigators in this field, Tinker and 
Luckiesh, have reported numerous studies, of 
which the two following are typical. Tinker 
(34), investigating reading of 10 point type 
at illuminations ranging from 0.1 to 53 foot 
candles (accuracy held constant), found speed 
of reading to increase rapidly from 0.1 to 3.0 
foot candles and no change in speed of read- 
ing between 3.0 to 53.0 foot candles. No 
change in accuracy of reading was reported. 
Luckiesh and Moss (18) in correlating illumi- 
nation intensity and nervous muscular tension 
resulting from reading 12 point type (large 
type) found the critical intensity level to be 
somewhere between 1 and 10 foot candles. 

Addition tasks have been used by many 
investigators as an active mental task. 
Thorndike (31) used addition scores of time 
and accuracy to investigate practice. Davis 
(8) used addition tasks to investigate the 
effect of noise on mental work. Rounds, 
Schubert, and Poffenberger (26) used addi- 
tion tasks to investigate the effects of practice 
upon the metabolic cost of mental work. 
Freeman (13) employed mental arithmetic 
as a mental task and measured it as a func- 
tion of spread of neuromuscular activity. 
Addition tasks have also been used by many 
other investigators as a mental task. Atkins 
(1) used arithmetic problems of the cancel- 
lation type as a mental task and measured it 
as a function of illumination. His range was 
from 9.6 to 118 foot candles. Performance 
measured by achievement was identical at all 
levels of brightness. 

In the present study performance as meas- 
ured by both speed and accuracy in addition 
problems was investigated as a function of 
low photopic brightness levels. 


Method 


Apparatus. The basic experimental! situation 
has been described in the preceding experiments. 
In the stimulus position for this experiment was 
a stimulus card carrier which slid in horizontally 
placed brass tracks. It was double (11 x 14 
inches) so that as one stimulus object was slid 
out of the subject’s view, another stimulus ob- 
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ject came immediately into view. Micro-switches 
at each end of the track were arranged so that 
illumination on the stimulus object went off as 
the carrier was moved from one position and 
came on as it reached the other position. In this 
way the shift from one stimulus object to an- 
other was accomplished rapidly in a short in- 
terval of darkness and did not require the sub- 
ject to make any shift in visual orientation. 
Thus, the subject was kept steadily at the chosen 
level of illumination throughout a series of read- 
ings, except for an instant of darkness between 
the presentation of stimulus objects. 

Materials. The stimulus objects, generously 
made available to this project by Dr. Mason 
Crook and Sam McLaughlin of Tufts College 
Air Forces project, consisted of high-contrast 
photographic reproductions of 10 point monotype 
numbers arranged into 100 items per chart, each 
item consisting of a 3-digit problem and its 2- 
digit sum arranged horizontally. There were five 
items per group, four groups per column, and five 
columns per chart. Each chart was reproduced 
so as to have white figures on a black back- 
ground and size was increased two-fold so that 
each digit had an over-all height of 3/16 inches. 
The problems were selected with predetermined 
specifications as to random numbers, repetitions, 
zeros, sums, et cetera. 

Subjects. The same subjects of the previous 
experiments participated in this experiment. A 
week or more passed between this experiment 
and the preceding experiment. 

Procedure. Each subject was given 10 charts 
at a brightness of 10 foot-lamberts on the day 
preceding the day of the formal trials; this fore- 
test served to reduce practice effects. 

On the formal trials each subject did two 
charts (200 problems) at each of five brightness 
levels. While each subject was becoming cone 
dark adapted the following instructions were 
read: 


This is an experiment to determine the influ- 
ence of varying brightnesses of illumination on 
the performance in simple addition problems. 
There are 5 columns of numbers, each column 
separated after 5 problems. The first three 
numbers are to be added up and if their total 
equals the fourth number, you are to say 
“right”; if they add up to some other number 
you are to say “wrong.” You will say “space” 
after each 5 problems to indicate the spaces 
on the charts; you are to say “new column” 
when you start each new column. 

When I say “ready” the lights will go off (sub- 
ject is preadapted to brightness level to be 
used by looking at a matte black chart illumi- 
nated with this brightness) and in a moment 
they will come on again. When the lights 
come on, you are to add the numbers as di- 
rected and say “right” or “wrong” to each 
problem; remember to say “space” after each 
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Table 10 


Showing the Total Number of Problems in Error in 200 
Addition Problems at Each of Five 
Brightness Levels 

Note: The task was impossible for all subjects at 
.0O5 foot-lamberts. 





Brightness in Foot-Lamberts 


Subjects 008 O1 . ‘ 


1 
29 2 
36 1 
21 ( 
54 1 
6 1 
26 5 
34 3 
55 4 
20 23 2 
37 24 1 


Rew we ww 


w 


Mean 40.5 
g 15.84 
SEu 5.28 


30.8 
14.16 
4.72 ‘ 5 


set of 5 problems and remember to say “new 
column” when going from one column to the 
next one. You are to add down the first 
column on the left and proceed to the right. 
Add as rapidly and as accurately as you can. 


The brightness levels were originally the same 
as the preceding experiments, but all subjects 
found the task impossible under the .005 foot- 
lambert level. The lower level was raised to 
.008 foot-lamberts, at which brightness better 
than chance results were obtained. The levels 
finally used were .008, .01, .05, .1, and 1.00 foot- 
lamberts. A piece of unexposed but developed 
paper from the same stock as the stimulus charts 
was used as an object for this standardization. 

Controls to balance out practice and fatigue 
effects were the same as in the previous experi- 
ments. 

Each subject reported for two sessions on con- 
secutive days. At the first session subjects were 
given the fore-test and on the second day the 
formal trials were given. Subjects were given no 
knowledge of results during the entire experiment. 


Results 


The data of this experiment consist of error 
scores and time scores made by the same 
group of 10 subjects under 5 brightness levels. 
It is interesting to note that at the .005 foot- 
lambert level all subjects felt the task to be 
impossible and refused to attempt it, report- 
ing that the task would be guesswork and 
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feared injury to eyes. The lowest level was 
raised to .008 foot-lamberts where all subjects 
did better than chance, although still report- 
ing difficulty at this low level. 

Errors. The principal analysis of errors 
is in terms of error frequency, i.e., the num- 
ber of problems in error in 200 problems. 
These data are presented in Table 10, which 
shows the total number of problems in error 
in 200 addition problems at each of the five 
brightness levels. Mean values of the group 
are thus based on a total of 2,000 readings at 
each level. The mean values from Table 10 
are also shown as the accuracy curve of Fig- 
ure 4. 

From Table 10 and Figure 4 inspection in- 
dicates that: (1) performance is markedly 
more accurate at the higher brightness levels; 
(2) there is rapid improvement in perform- 
ance up to .05 foot-lamberts; and (3) little 
or no improvement above .05 foot-lamberts. 

A ¢t analysis was carried out comparing 
group performance for each pair of bright- 
ness levels. A summary of this analysis is 
presented in Table 11. From this table it is 
seen that all differences between .05 foot- 
lamberts and lower brightness levels are 
highly significant (1 per cent level) and all 
differences between .05 foot-lamberts and 
higher brightness levels are not significant at 
the 5 per cent level. 

From this analysis it seems clear that: (1) 
accuracy of performance in doing addition 
problems increases sharply in the region be- 
tween .01 and .05 foot-lamberts; and (2) 
little or no increase in accuracy results from 
increases in brightness above this level. 
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Table 11 


Values of t, Comparing Mean Number of Errors in 200 
Addition Problems at Five Brightness Levels 





Brightness in Foot-Lamberts 


008 01. 05 “a 1.00 


008 _— — —_ 
Ot 3.28** 
O5 ian 
a jaa 
1.00 7.40** 


6.84** 
6.26** 
6.40** 


1.26 
1.48 


** Significant at the 1 per cent level. 


Time. Data on the speed of performance 
in doing addition problems at the brightness 
levels studied consist of the time in seconds 
required to do 200 addition problems at. each 
brightness level. Each subject did two charts 
of 100 problems each at each brightness level. 
Table 12 presents the total time required to 
do 200 problems for each subject at each 
brightness level, and also the group means. 
The mean values from Table 12 are also 
“shown as the speed of response curve of 
Figure 4. 

Figure 4 shows that the curve for time 
scores has the same general shape as the error 
curve. There appears to be a sharp increase 
in performance between .01 and .05 foot-lam- 


Table 12 


Showing the Total Time in Seconds Required to Do 
Addition Task for 200 Problems at Each 
of Five Brightness Levels 


Brightness in Foot-Lamberts 


.008 O1 OS a 1.00 


1569 1422 687 623 634 
1189 903 348 358 301 
1019 839 419 416 338 
964 1099 420 359 298 
622: 571 414 415 406 
1083 1007 367 389 356 
1394 1046 502 446 426 
770 865 414 370 332 
727 694 448 424 411 
1344 1105 476 426 418 


Subjects 


1068.1 
294.2 
98.1 


955.1 
226.3 
75.4 


449.5 
90.9 
30.3 


422.6 
72.7 
24.2 


392.0 
92.6 
30.8 





Table 13 


Values of t, Comparing Mean Time in Seconds to Do 
200 Addition Problems at Five 
Brightness Levels 
Brightness in Foot-Lamberts 


008 01 OS 





2.56* — — 

8.48** = 8.50** — — 
ion Te. 2a - 
8.08** 


* Significant at 5 per cent level. 
** Significant at 1 per cent level. 


berts. The ¢ analysis in Table 13 shows that 
all differences of means are significant at the 
5 per cent or 1 per cent level. The differ- 
ence between .008 and .01 foot-lamberts is 
significant at the 5 per cent level, and the 
difference between .05 and .1 foot-lamberts is 
significant at the 5 per cent level; all the 
other differences are significant at the 1 per 
cent level. Thus, it can be seen that even 
though the general results found for time 
agree with those for errors, there is evidence 
to support the statement that performance 
as measured by speed of doing addition tasks 
increases significantly with increased bright- 
ness up to 1.0 foot-lamberts. 

Practice Effects. It will be recalled that 
each subject did 1,000 addition problems be- 
fore formal trials were begun, in order to re- 
duce practice effects. As in the previous ex- 
periments, in order to determine whether 
practice effects were playing a significant role, 
the error and time scores were tabulated for 
each subject in terms of first brightness level 
tested, second level tested, etc. The ¢ test 
analysis showed no evidence of practice effects. 


Discussion 


The results of the error scores indicate 
clearly that there is a critical level of bright- 
ness below which subjects find it difficult to 
perform this addition task. Above this level 
further increases in brightness, at least up to 
1 foot-lambert and very probably indefinitely, 
produce no significant increments of perform- 
ance. These results agree completely with 
those from the preceding experiments. 
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The results’ of the time scores, although 
comparable to the error scores in general 
trend, indicate that speed of performance in- 
creases as brightness increases, at least up 
to 1.00 foot-lambert. The increase shown by 
the significance of the several differences is 
not a steady increase but has its greatest rate 
between .01 and .05 foot-lamberts. 

These findings agree with the findings of 
the three preceding experiments in showing 
that performance in an active mental task as 
a function of brightness shows a critical value 
at the same general level of brightness. This 
critical brightness level is between .01 and 
.05 foot-lamberts which is considerably lower 
than the value of 3 foot-lamberts usually re- 
ported as critical for reading performance. 
In such studies speed of reading has usually 
been the criterion of performance. It has 
already been noted that visual acuity may 
complicate the picture of reading at low pho- 
topic levels and it has been noted that in some 
studies accuracy did not change over a range 
of .1 to 53 foot-candles when words were large 
enough to be read. The time scores in this 
study indicate an increase of performance up 
to 1 foot-lambert, which was the highest value 
used, but this increase was a differential in- 
crease with slower rates between .05 to .1 to 
1.0 foot-lamberts, and a sharp increase be- 
tween .01 and .05 foot-lamberts. Error scores 
in this study decreased rapidly up to .05 foot- 
lamberts and then leveled off, showing no sig- 
nificant decrease for the higher levels of 
brightness. 


Over-all Discussion 


The present experiments have been con- 
cerned with visual perceptual performance as 
a function of low photopic brightness levels. 
The functions found, like those of earlier dial 
reading studies from this laboratory, differ 
markedly from the functions which have fre- 
quently been reported from studies of the 
effects of brightness on visual acuity and 
foveal flicker fusion frequency. These latter 
functions have been found by numerous inves- 
tigators to increase steadily in a manner pro- 
portional to the logarithm of the stimulus 
intensity. 

In contrast the results from the four ex- 
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periments reported here, as well as from the 
preceding dial reading experiments, indicate 
that performance improves as stimulus in- 
tensity increases only up to a certain point 
(approximately 0.05 foot-lamberts, depending 
somewhat on the specific task employed). 
Beyond this point increases in stimulus in- 
tensity are relatively unimportant in these 
experiments, the increments in performance 
being small and non-significant. 

These findings raise a number of interest- 
ing theoretica! questions with respect to the 
physiological mechanisms and inter-relations 
which may be hypothesized to explain the 
present results. A detailed account of hy- 
potheses which might account for these find- 
ings, and especially of a possible rod-cone 
facilitation and inhibition relationship, will 
not be presented here. Such an account has 
been developed elsewhere in some detail.’ 

In passing it may be of interest to note that 
a somewhat analogous situation is found in 
the field of audition. When per cent word 
articulation is plotted against stimulus in- 
tensity, there results a performance curve 
with sharp increase in the 10 db. region and 
a leveling off at about 20 db. (24). Myers 
and Harris (21) investigating the emergence 
of a tonal sensation with frequencies from 
500 to 14,000 cps. found a “zone of detecta- 
bility” (intensity area between a 50 per cent 
detection threshold and a 50 per cent pure 
tone threshold) between 2 to 4 db., inde- 
pendent of frequency. In their experiment 
with frequency matching, performance im- 
proved with increase in intensity only up to 
the level of 10 db. It would seem from 
these studies that in audition, as well as in 
visual tasks, there is a critical sensation level 
below which performance is increasingly: poor 
and above which increases in stimulus in- 
tensity do not increase appreciably the sub- 
ject’s performance. 

From a practical standpoint the results of 
the present experiments suggest certain mini- 
mum values for adequate performance of 
visual tasks. From the present study and 
other available sources we can summarize for 


1In the original from which this report was re- 
written, on file as a doctoral dissertation in the Uni- 
versity of Rochester Library. 
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a variety of visual tasks critical brightness 
levels, below which performance is impaired: 


Miiller-Lyer illusion 
Depth perception 
Motion discrimination 
Addition task 

Dial reading 


between .01 — .05 FL. 
between .01 — .5 F.L. 
between .05 — .1 FL. 
between .01 — .05 F.L. 
approx. .02 F.L. 


Critical fusion fre- 
quency (cone) 
Span of apprehension 
(.032 sec. — 1 sec.) 
Panel indicator lights 
Form silhouettes 


05 F.L. 


.1 — 05 FLL. 
between .01 — .1 e.f.c. 
above .003 e.f.c. 


In view of the above findings it might seem 
advisable to consider .05 to .1 foot-lamberts 
(equals .1 e.f.c.), which is one of the highest 
values given, to be the limiting values to be 
used in practical situations. This indicates 
that in situations where the maximum quality 
and quantity of performance is required with 
the minimum brightness a value of approxi- 
mately .05 to .1 foot-lamberts should be em- 
ployed. Lighting of airplane cockpits, auto- 
motive and rail operator compartments and 
other situations which require good visual 
performance in the operator’s compartment 
plus adequate dark adaptation permitting for 
good form discrimination are situations to 
which this finding is relevant. 

Somewhat aside from the present data but 
as a rather interesting extension is the pro- 
posed use of a flood-type light yielding this 
critical brightness level in the operator’s com- 
partment (so situated as not to give reflec- 
tions from the windshield, etc.). This should 
serve to raise the adaptation level of the eyes 
to the critical level where form and silhouette 
discrimination is adequate. On-coming head- 
lights or disturbing flashes of various types 
should have less “blinding” or “dazzle” ef- 
fect because the adaptation change of the eyes 
would be less than that which is now required 
(from dark or near dark adaptation to bright 
on-coming lights or to flashes of lights). 
Since the visual performance that is needed 
by the operator is one of form, silhouette, 
depth perception, motion acuity and mini- 
mization of illusion, etc., his visual per- 
formance outside thé compartment should 
also improve or at least not be impaired. An 
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emphasis on the physiological cause of “daz- 
zle” rather than the changing of the physical 
and optical constituents of headlights, wind- 
shields, gun flashes, etc., may be a more fruit- 
ful approach to the problem. 


Summary 


A systematic investigation of performance 
in visual tasks as a function of low photopic 
brightness levels was attempted. Four types 
of visual tasks were investigated: judgment 
of magnitude of an illusion, absolute thresh- 
old for motion, depth perception and a simple 
addition task. All tasks were investigated 
under five brightness levels in the range of 
.005 foot-lamberts to 1.00 foot-lamberts. In 
each of the experiments, critical brightness 
levels were found below which performance 
was increasingly poor. Increased brightness 
above the critical level improved performance 
relatively little or not at all. The critical 
level for motion threshold was .1 foot-lam- 
berts; for the other tasks approximately .05 
foot-lamberts. It was suggested that for 
maximum performance on visual tasks, with 
minimum brightness, illuminacion should be 
adjusted to yield brightness values of .05 to 
.1 foot-lamberts. 


Received May 28, 1953. 
Early publication. 
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Applied Psychology in Action 


Evaluating Supervisory Training at the Job Performance Level 


Theodore R. Lindbom 
Personnel Department, Midland Cooperative Wholesale, Minneapolis, Minn. 


Evaluation of college and university courses, 
of practical necessity, ordinarily ends with the 
semester-end examination. The assumption 
is made, rightly or wrongly, that performance 
in the examination is correlated with perform- 
ance in future situations where course content 
can and should be applied. This is the re- 
port of the results of an attempt to evaluate, 
beyond the classroom level, the performance 
of a group of University of Minnesota General 
Extension Division students who had taken a 
course in supervision. The course is a discus- 
sion type course in human relations for super- 
visors, with emphasis on the recognition of 
individual differences and the “human ele- 
ment” in supervision, which runs for a semes- 
ter and consists of 16 evening meetings each 
14% hrs. in length. Practically all students 
were employed full time with about two-thirds 
in supervisory capacities. The group studied 
were students during the spring and _ fall 
semesters of 1950 and the spring semester of 
1951 in 5 different sections of the class all 
taught by the writer. This evaluation was 
made in addition to traditional classroom ex- 
aminations and test-retest with the standard 
test, “How Supervise?”’. 

A mailed questionnaire, sent in March, 
1952 with one follow-up, produced 66 returns 
from 129 students. Of these 66, 41 were from 
persons in supervisory jobs. The analysis of 
these 41 returns is presented here. 

The 2 major questions asked were con- 
cerned with behavior changes at two levels: 
(1) changed behavior of the supervisor on- 
the-job; and (2) changed behavior of the 
people he supervised resulting from changed 
methods of supervision. 

In answer to the question, “Is there any- 
thing that you are now doing differently—as 
a foreman or supervisor—because of your ex- 
perience in these discussions?” 63% answered 
yes, 27% no, and 10% did not answer or 
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were undecided. Typical of the comments in 
answer to this question were: 

“More consideration of the employees’ 
problems.” 

“When employees are by-passed for up- 
grading, an explanation is given them.” 

“T am spending more time determining the 
facts when grievances arise.” 

“Realizing the personality differences in 
people and using that in dealing with people.” 

In answer to the question, “Is there any- 
thing about the people you supervise now that 
is different from the way they were before— 
anything that has resulted from changes in 
your operations due to taking ‘Elements of 
Supervision’?” 44% answered yes, 29% no, 
and 27% did not answer or were undecided. 
Typical of the comments made when respond- 
ents were asked to describe these changes in 
the people they supervised were: 

“Morale is better than in other divisions, 
not to speak of increased efficiency.” 

“They show more of an attitude of work- 
ing ‘with’ rather than ‘for.’ ” 

“My employees come to me for help in al- 
most any type of problem.” 

“Lower costs of operation due to willing- 
ness to cooperate with their foreman and 
themselves.” 

Although results indicate that the course 
was successful, at least to some degree, the 
study design permits neither definite conclu- 
sions nor generalization. The group to begin 
with was a highly selected one, and the per 
cent return of questionnaires is low enough 
to allow an additional selection factor to be 
operating. Time between completion of the 
course and filling out the questionnaire ranged . 
from 10 to 22 months. Conscious or uncon- 
scious misrepresentation of facts by respond- 
ents is also a possible factor. 

Because of these limitations, the study is 
not reported for its specific findings or for 
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generalization. Instead, it is presented as an 
illustration of an easy-to-make evaluation at 
a level beyond the traditional classroom ex- 
amination which appears to measure more di- 
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rectly the kind of behavior change which the 
course was intended to bring about—the on- 
the-job behavior of the supervisor and the 
people he supervises. 


Criterion Rationale for a Personnel Research Program 


Theodore R. Vallance, Albert S. Glickman, and George .J. Suci 


American Institute for Research 


The authors have been engaged in setting 
up and putting into operation a program for 
personnel research with naval officers. In any 
activity of this kind, it is vital to develop a 
rational framework or “research constitution” 
for the program. Since it appears that the 
framework we have developed contains many 
elements which have general applicability in 
the planning of personnel research in other 
educational, industrial, or military settings, 
the substance of our criterion rationale is pre- 
sented here. 


Ultimate and Intermediate Criteria 


When initiating a program of personnel re- 
search we must first ask: What is the nature 
of the criterion? Until the criterion is de- 
fined, assessment of the program, or any of 
its parts, is not possible. The answer to the 
question is reflected most ultimately in terms 
of furthering the objectives of the organiza- 
tion. For the Navy the long-run goals are: 

“1. To defend and support the Constitution 
of the United States against all enemies. 

2. To maintain, by timely and effective 
military action, the security of the United 
States, its possessions and areas vital to its 
interest. 

3. To uphold and advance national poli- 
cies of the United States. 

4. To safeguard the internal security of 
the United States.” * 

The “success” of any organization is meas- 


1 This work was done by the Officer Personnel Re- 
search Project, at the U. S. Naval Schools Command, 
Newport, R. I. under contract Nonr 890(00) between 
the American Institute for Research and the Office of 
Naval Research. 

2 Key West Agreement, 1948. 


ured by the extent to which its objectives are 
achieved. However, the fulfillment of “ulti- 
mate” organizational aims can seldom, if ever, 
be directly assessed. Other outcomes, less re- 
mote, more susceptible to measurement, must 
normally be used to evaluate day-to-day 
operations. Historical hindsight and logical 
analysis are the usual standards for assuming 
correlation between the ultimate criterion and 
subordinate “intermediate” criteria. Contri- 
butions to success are then measured at many 
levels presumed to be correlated with the more 
ultimate criterion. 

The practical research problem, then, is to 
determine the highest organizational level at 
which quantifiable measures considered to be 
reflections of personnel behaviors are possi-, 
ble. In our naval model, ships or command 
units ashore comparable in size, complexity, 
and autonomy, represent the organizational 
entities upon whose efficiency the performance 
of a given crew member may be expected to 
have a measurable effect. Consequently, it 
was taken as a basic assumption that for the 
measurement of success of individuals, the 
performance of ships or comparable shore 
units represents the highest level at which 
meaningful and practical quantitative cri- 
terion measurement can be established. As 
such these performances also comprise the 
basis for determining the validity and prac- 
tical meaningfulness of subordinate criteria 
—for departments, divisions, smaller groups, 
and individuals. 


Criteria of Individual Success 


Several questions which arise during the 
process of evaluating naval personnel are be- 








430 


lieved to have bearing in programs of evalua- 
tion for other kinds of administrators, and 
executives. These are discussed below. 

What is “successful performance” for the 
individual? At what rank, or after what time 
on the job, can or should evaluation be made? 

The possibility must be recognized that not 
all officers who are competent at one level of 
rank or responsibility will be equally com- 
petent at the next or other higher levels. (It 
has not been shown that a good ensign neces- 
sarily makes a good admiral.) The question 
then arises: Is “success” composed of the 
same factors at each level? If mot, we are 
confronted with constantly shifting criteria 
and must choose the intermediate criterion 
level that is desirable for a particular class of 
officers on logical or empirical grounds. 

As the correlation between criteria for the 
several ranks decreases, the risk increases that 
the selection variables validated against lower- 
level rank criteria will be unrelated to higher- 
level rank criteria, or indeed may be nega- 
tively related to them. 

We are thus confronted with the question: 
When are we to define “success” as having 
been achieved? Is it at Officer Candidate 
School, or when an ensign, or when a com- 
mander, or when a chief-of-staff, or is per- 
formance in the next superior rank an ade- 
quate criterion against which to evaluate per- 
formance at immediately subordinate ranks? 

We are also confronted with the related 
problem of deciding what criterion levels to 
choose for assessing the effectiveness of train- 
ing. That is, by what performance standards, 
at what level of responsibility, should the 
adequacy of training at Officer Candidate 
School and elsewhere be judged? Should per- 
formance right after schooling be taken as 
the measure of training effectiveness, or should 
evaluations be made after some specified time 
has elapsed? 

Likewise, with regard to selection, assign- 
ment, promotion, retirement, and command, 
there arises the question of where to look for 
appropriate standards. 

Competency in an executive hierarchy is 
usually considered to be highly correlated with 
rank. Ideally, rank and competence in rele- 
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vant areas should be perfectly correlated. 
To the extent that the correlation is less than 
1.00, room exists for improvement of tech- 
niques for evaluating training and duty per- 
formance, and for assignment to jobs. 

It must be recognized that much of the 
preceding involves policy decisions at a high 
level and consists of questions which cannot 
be answered by a research unit. Lack of such 
policy decisions leaves the research goals in 
doubt, leads to confused direction, and lowers 
the utility of research products. 


Criterion Methodology 


The relative status of a variable as a pre- 
dictor or a criterion is in many cases simply 
dependent upon the chronological sequence in 
which variables may be organized. Each in- 
termediate performance criterion presumably 
should be correlated with a more ultimate 
criterion and hence serve as a predictor of it. 
As demanded by exigencies, many perform- 
ance measures can be considered either cri- 
teria or predictors. 

Comparability of measures of performance 
is a basic requirement if such measures are to 
be used effectively. If success is determined 
at all ranks and in all duties by the same fac- 
tors, with differences from rank to rank rep- 
resenting only variations in degree rather than 
kind of factors involved, then the approach to 
evaluation is relatively simple. Although this 
is not likely to be true in most cases, it in- 
volves primarily attempts to increase relia- 
bility of measures of the factors demonstrated 
to possess the highest validity as criteria. 

Questions of the comparability of criterion 
measures then crop up with respect to all 
aspects of criterion measurement and we are 
faced wiih the problem of how to render 
criterion measures equivalent in consideration 
of differences of rank, raters, ship types, duty, 
hazard, kind of subordinates, and other situa- 
tional factors. 

Applicable to all of the foregoing is the 
question: What is the line of dernarcation be- 
tween satisfactory and unsatisfactory per- 
formance? Does the standard of satisfactory 
and unsatisfactory fluctuate as a function of 
any, or several, of the above? 
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Illustrative Criterion Measures 


Individual job performance criteria may be 
classified in many ways, dictated by the in- 
stitution’s goals and organization. For naval 
officers we have organized them under two 
broad headings: technical skill and human 
relations skili. 

Each of these sets of skills in turn may be 
considered as effectors of success at several 
levels in the operational hierarchy, which in 


How’s Your 


“Empathy” is a word that has been in the 
dictionaries a long time but it’s just beginning 
to gain recognition as an important quality 
for executives at all levels. One dictionary 
definition is “the imaginative projection of 
one’s own consciousness into another being” 
but as used by psychological consultants in 
business and industry empathy is used to in- 
dicate the ability to imaginatively project the 
other fellow’s consciousness into your own, 
thereby putting yourself mentally into his 
shoes, to the point of being able to guess 
pretty closely what his thoughts and reactions 
will be in a given situation. 

In its simplest form empathy is well illus- 
trated by the old story of the village idiot 
who found the lost horse when nobody else 
could. He just sat down and figured out 
where he would go if he were a horse. He 
went there and there was the forse. 

At a recent meeting of chemical engineers, 
Dr. Richard S. Schultz, a New York psycho- 
logical consultant, mentioned the importance 
of empathy as an executive quality, pointing 
out that individuals who rate high in this char- 
acteristic can more readily understand, pre- 
dict and control the thinking, feeling, and 
actions of other people. Psychologists are 
now at work devising methods of measuring 
this quality. 
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the Navy would be the ship, the group (de- 
partmental, divisional, or other), and the in- 
dividual. 

These may further be sub-classified as to 
whether they are demonstrated under train- 
ing conditions or on-the-job. 

Finally these criteria may be further speci- 
fied according to the type of measure being 
applied, as schematized, for example, in Chap- 
ter 5 of R. L. Thorndike’s Personnel Selec- 
tion, Wiley, 1949. 


Empathy ? 


“The simplest illustration of empathy is to 
recall your last experience at an exciting 
athletic event or theater show,” he said. 
“Remember how you reacted and identified 
yourself with specific thoughts, feelings, and 
actions of the feature personalities?” 

Empathy, he said, may be further described 
as a combination of social sensitivity and 
social intelligence. “It is with such awareness 
that we can be most skillful in our daily con- 
tacts with people,” he said. 

It is encouraging to know that progress is 
being made in measuring the characteristic 
technic ly known as empathy, for it is a 
trait that under various inexact tags has been 
recognized as an important though elusive 
attribute of success. It is often the reason 
why two men of apparently equal upbringing, 
education, intelligence, and opportunity will 
vary so widely in their degree of business suc- 
cess: one of them can see things from the 
other fellow’s point of view; the other can’t, 
and continually rubs people the wrong way. 

If accurate measurements are on the way 
that will measure this sort of “social savvy”’ 
they will be of particular use to the insurance 
business, in which cooperating with people 
and getting along with them on a good basis 
are so much more important than purely tech- 
nical know-how. (The National Underwriter, 
July 3, 1953.) 








Book Reviews 


Maier, Norman R. F. Principles of human 
relations, applications to management. 
New York: John Wiley & Sons, Inc., 1952. 
Pp. ix + 474. $6.00. 


This book appears to fulfill three purposes, 
(1) to present Dr. Maier’s research and ex- 
perience with human relations training pro- 
grams in industry, (2) to furnish a basic text- 
book for courses in human relations in indus- 
try with material adaptable to laboratory 
exercises, and (3) to serve as a manual to 
guide industrial psychologists in introducing 
human relations programs in business and in- 
dustry. Although the systematic discussion is 
based primarily upon Dr. Maier’s own work, 
the conclusions that he reaches are similar to 
those which have been obtained previously by 
others in the area of human relations. 

The material deals mainly with methods 
and techniques for a human relations training 
program, including the use of group discussion 
methods, role playing, and group decision pro- 
cedures. The use of such techniques is aimed 
at overcoming hostility, fears, feelings of in- 
security, frustrations, and other barriers to 
acceptance of democratic supervisory prac- 
tices. In addition, how to assist the super- 
visor to be permissive in his dealing with 
individuals and to use non-directive counsel- 
ing techniques is discussed at some length. 
Ample case material is furnished to provide 
demonstrations of the value of the various 
methods and techniques discussed. Of par- 
ticular value is an exposition of how group 
discussion and role playing techniques can be 
adapted for use with large groups when it is 
not possible to use small groups in training. 

Concepts employed are weil defined and 
explained. The general ease of reading is 
marred only by an occasional awkward sen- 
tence, and failure to provide adequate transi- 
tion from one idea to another. 

The major emphasis of the book is aimed 
at explaining and furnishing demonstrations 
of the value of group discussion and role play- 
ing for supervisory training at all levels from 
top management to line supervisors. Through 
the use of such techniques it is possible to 
change a supervisor's feelings and attitudes 
which conflict with maintaining good human 
relations with his group of workers. The use 
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of role playing in training situations directs a 
supervisor’s attention away from the words 
or logic which are overtly expressed and 
focuses it upon the feelings which govern the 
course of interpersonal relations. Most im- 
portant of course he comes to understand his 
own involvement in the process. Only through 
gaining insight in regard to how feelings in- 
fluence the tenor of interpersonal relations and 
how they determine the kind of mutual un- 
derstanding which results can a supervisor, if 
necessary, come to appreciate and accept new 
modes of behavior appropriate for dealing 
effectively with other people. Role playing 
provides a means of gaining experience under 
conditions which do not require a supervisor 
to “save face.” Consequently, a situation is 
provided where he can examine objectively 
how supervisory attitudes, both good and bad, 
influence the course and outcome of inter- 
personal relations. He comes to realize that 
being permissive is more advantageous for ob- 
taining constructive actions with a concurrent 
improvement in his status with the workers in 
regard to their respect for his authority, con- 
trol and prestige. Such an outcome removes 
the hampering effect of presumed risks which 
a supervisor imagines might endanger his 
capacity to carry out his responsibilities if he 
adopts democratic practices. Once he is freed 
of concern for the necessity of protecting his 
own security, a supervisor is able to let his 
group solve its own problems under his guid- 
ance. When increased effectiveness of the 
group in accomplishing work ensues, the su- 
pervisor is enabled to realize the value of 
exercising democratic control by utilizing the 
forces which are in the group rather than by 
depending upon the use of his power. 

Dr. Maier’s book is a good illustration of 
the basic contribution that psychology can 
make toward developing a realistic philosophy 
of making life and work tolerable in modern 
industrial society. True, the basic principles 
of such a philosophy still go back to Aristotle, 
Plato, the Christian ethics, and the English 
common law. But psychology, by utilizing 
the scientific method, can still contribute ma- 
terially to their realization by furnishing 
proof of the effectiveness of various methods 
and techniques which when fully assimilated 
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into the mores of our society will come even- 
tually to be considered common sense ap- 
proaches to maintaining good human rela- 
tions. It is fortunate that psychology has 
men with the capability and insight of Dr. 
Maier, whose approach to developing an ap- 
plied science of human behavior transcends 
the bonds of narrow specialization. 
Wilton P. Chase 
Air Research and Development Command, 
Human Resources Research Center, 
Lowry Air Force Base, Denver, Colorado 


Deese, James. The psychology of learning. 
New York: McGraw-Hill, 1952. Pp. x 
+ 398. $5.00. 

Since this book is designed as a text (for 
advanced undergraduate and graduate stu- 
dents), one of the difficulties inherent in all 
textbook reviews is encountered ere: 
professional reader will find in it much that 
is already familiar, and low in interest value, 
but which the student may react to very dif- 
ferently. The reviewer must therefore try, 
as best he can, to look at the book through 
student eyes. When this is done, the present 
volume stands up well. It is clearly and 
simply written, with an occasional colorful 
turn of phrase. It is, moreover, comprehen- 
sive in scope—including, as it does, discussion 
of animal and human learning in both labora- 
tory and everyday (clinical, applied) settings 
—and yet wisely avoids trying to be encyclo- 
pedic in coverage: “broad rather than exhaus- 
tive” is the author’s avowed aim. While the 
book is not founded upon or integrated around 
any one conception of learning, and thus not 
so provocative as it might otherwise be, it has 
the merit of being accurate, critical, and of 
very possibly inspiring students to get right 
to work on research designed to fill in the 
more glaring gaps in our knowledge. 

Perhaps the gravest short-coming of the 
book is the author’s failure to interrelate dis- 
cussions in different chapters. Not infre- 
quently a given piece of research or theory 
will be discussed, sympathetically and well, 
in one chapter; and yet, in a later (or earlier) 
chapter, this material will not be brought to 
bear upon problems where it is rather obvi- 
ously relevant. A similar criticism is also 
occasionally appropriate with respect to ex- 
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perimental facts and hypotheses available in 
the literature which have not been included 
in the book at all. The net result is that the 
book lacks more in total impact and cogency 
than it needs to. However, it is a good start 
and in later editions may well rise to meet 
the challenge of its topic more fully than it 
presently does. 

The author is well aware of the “applied” 
potentialities of the psychology of learning 
and refers from time to time to such fields as 
education and psychotherapy. However, he 
is conservative in what he believes labora- 
tory fact and theory can at present contribute 
along these lines. He very usefully points to 
some of the not very happy results of prema- 
ture application of particular conceptions of 
learning and urges further inquiry rather than 
rash “practicality.” 

While The Psychology of Learning puts a 
desirable emphasis upon laboratory proce- 
dures as a source of knowledge in this area, it 
tends to slight what is already known and can 
be further learned in the “applied” setting. 
In other words, it does not emphasize as much 
as it might the reciprocal benefits of interac- 
tion between laboratory and field. It some- 
times seems to imply that all knowledge origi- 
nates in the laboratory and is then channeled 
toward application in the field. The author 
properly notes that laboratory theories some- 
times fall on their face in a practical setting, 
but he does not, in the reviewer’s judgment, 
give the field proper credit as itself a kind of 
“laboratory” and certainly a setting which 
can provide highly stimulating questions and 
suggestions to be carried back for more rigor- 
ous types of investigation. 

The Psychology of Learning is an excellent 
job of bookmaking; and despite its being 
pitched at the textbook level, professional 
readers will find parts of it novel and exciting. 


O. Hobart Mowrer 
University of Illinois 


Ulrich, David N., Booz, Donald R., and 
Lawrence, Paul R. Management behavior 


and foreman attitude. Boston: Harvard 
Business School, 1950. Pp. 56. $.75. 


This report is the result of an 8 month case 
study made by a research team consisting of 
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the three authors. The study was carried out 
through informal observation and interviews 
in a manufacturing firm employing about 500 
persons located in a large eastern city. About 
half the time of the study was spent in ob- 
servation of an assembly department of 36 
female employees and their foreman. 

As the title implies, the main object of the 
study was to determine what effects the be- 
havior of top management had on the fore- 
man. In addition, the effects of the behavior 
of other groups on the foreman, including his 
employees, staff specialists, and his immedi- 
ate superior were also studied. 

A number of difficult and strained relation- 
ships at all levels of the organization are 
pointed out, causes of these difficulties hy- 
pothesized, and recommendations made on 
how the relationships could be improved. A 
major recommendation made is that top man- 
agement make greater efforts to understand 
the effects of administrative action on em- 
ployees and supervisors. 

Because the only evidence given to back up 
what is said consists of scattered selected 
quotations of remarks made by those ob- 
served; the reader will find he is being asked 
to accept the conclusions and recommenda- 
tions pretty much on faith in the analytical 
ability of the researchers. Despite this limi- 
tation, however, few readers who deal with 
similar problems will finish this report without 
some new insights into these problems and 
new ideas for dealing with them in their own 
situations. Theodore R. Lindbom 

Personnel Department, 


Midland Cooperative Wholesale, 
Minneapolis, Minn. 


Weinland, James D. and Goss, Margaret V. 
Personnel interviewing. New York: Ronald 
Press, 1952. Pp. vii +416. $6.00. 

This book deals with the aims and tech- 
niques of business interviewing and is ad- 
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dressed to individuals concerned with per- 
sonnel relations and employment. Although 
many of the principles and procedures are ap- 
plicable to all types of personnel interviewing, 
the book emphasizes employment. Chapters 
are devoted, however, to other types of inter- 
viewing, such as merit rating, disciplinary, 
counseling, etc. 

A section on the interviewer and his work 
deals with introductory and background ma- 
terial, ranging from individual differences to 
interviewing environment and the training of 
interviewers. A second part deals with ‘ech- 
niques, including material on directive, non- 
directive, and patterned interviews. The third 
part of the book deals with interviews for 
various purposes. 

Although the book contains much of value 
and has interesting material and views, the 
over-all effect is disappointing. Perhaps one 
reason for disappointment is the great need 
for a comprehensive and up-to-date text in 
the field of personnel interviewing. The chap- 
ters of this book get off to a good start, but 
the reviewer had a feeling of disappointment 
at the end of each. This was due partly to 
failure of the authors to organize and sys- 
tematize the material adequately. This is 
true in spite of their predilection for lists 
and classifications. Unfortunately, such lists 
often appeared incomplete or haphazard. 

The authors are guilty of looseness, am- 
biguity and over-generalization. One suspects 
that some dogmatically worded statements 
would be considered better as hypotheses than 
as proved facts. 

The reviewer's over-all opinion is indicated 
by the fact that although he is currently train- 
ing interviewers, he is not using this book. 
Other materials are being used even though 
older, or available only in less accessible form. 

Clifford E. Jurgensen 

Minneapolis Gas Company 
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The workshop handbook. Walter A. Ander- 
son, Rollin P. Baldwin, and Mary Beau- 
champ. New York: Columbia University, 
1953. Pp. 65. $1.00. 

Adjustment to physical handicap and illness: 
A survey of social psychology of physique 
and disability. Roger G. Barker, Beatrice 
A. Wright, Lee Meyerson, and Mollie R. 
Gonick. New York: Social Science Re- 
search Council, 1953. Pp. 440. $2.00. 

Differential migration in the corn and cotton 
belts. Donald J. Bogue and Margaret Jar- 
man Hagood. Oxford: Scripps Foundation, 
1953. Pp. 248. $2.25. 

The fourth mental measurements yearbook. 
Oscar K. Buros, Editor. Highland Park, 
N. J.: The Gryphon Press, 1953. Pp. 
1,189. $18.00. 

Group dynamics. Dorwin Cartwright and 
Alvin Zander. Evanston: Row, Peterson 
and Company, 1953. Pp. 642. 

Human behavior: psychology as a bio-social 
science. Lawrence E. Cole. New York: 
World Book Company, 1953. Pp. 884. 
$4.56. 

A factor analysis of verbal and non-verbal 
tests of intelligence. Reverend James T. 
Curtin. Washington, D. C.: The Catholic 
University of America Press, 1952. Pp. 
63. $1.25. 

Raising the sights of office management. M. 
J. Dooher, Editor. New York: American 
Management Association, 1953. Pp. 59. 
$1.25. 

Industry enters the atomic age. M. J. 
Dooher, Editor. New York: American 
Management Association, 1953. Pp. 31. 
$1.25. 

Guides to meeting tomorrow’s production 
needs. M. J. Dooher, Editor. New York: 
American Management Association, 1953. 
Pp. 64. $1.25. 

Planning for worker security and stability. 
M. J. Dooher, Editor. New York: Ameri- 
can Management Association, 1953. Pp. 
40. $1.25. 
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The new climate of union-management rela- 
tions. M. J. Dooher, Editor. New York: 
American Management Association, 1953. 
Pp. 32. $1.25. 

Factors in intelligence and achievement. 
Justin A. Driscoll. Washington, D. C.: 
The Catholic University of America Press, 
1952. Pp. 56. $1.00. 

College board scores. Henry S. Dyer. New 
Jersey: College Entrance Examination 
Board, 1953. $.75. 

Stabilization of employment is good manage- 
ment. Charles C. Gibbons. Kalamazoo: 
W. E. Upjohn Institute for Community Re- 
search, 1953. Pp. 16. Gratis. 

The uneducated. Eli Ginzberg and Douglas 
W. Bray. New York: Columbia Univer- 
sity Press, 1953. Pp. 246. $4.50. 

Psychosis and civilization. Herbert Gold- 
hamer and Andrew Marshall. Glencoe: 
The Free Press, 1953. Pp. 126. $4.00. 

Measurements of human behavior. Edward 
B. Greene. Revised edition. New York: 
The Odyssey Press, Inc., 1953. Pp. 790. 
$4.75. 

A clinical approach to children’s Rorschachs. 
Florence Halpern. New York: Grune and 
Stratton, Inc., 1953. Pp. 288. $6.00. 

Introduction to psychology. Ernest R. Hil- 
gard. New York: Harcourt, Brace and 
Company, 1953. Pp. 659. $7.50. 

Current problems in psychiatric diagnosis. 
Paul H. Hoch and Joseph Zubin. New 
York: Grune & Stratton, 1953. Pp. 291. 
$5.50. 

The psychology of successful selling. Richard 
W. Husband. New York: Harper and 
Brothers, 1953. Pp. 306. $3.95. 


Techniques of successful foremanship. Eugene 
E. Jennings. Madison: University of Wis- 
consin, School of Commerce, Bureau of 
Business Research and Service, 1953. Pp. 
41. $1.15. 

Psychology and alchemy. C.G. Jung. New 
York: Bollingen Foundation, Inc., 1953. 
Pp. 563. $5.00. 
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The psychology and psychotherapy of Otto 
Rank. Fay B. Karpf. New York: Philo- 
sophical Library, 1953. Pp..129. $3.00. 

Elementary school objectives. Nolan C. 
Kearney. New York: Russell Sage Foun- 
dation, 1953. Pp. 189. $3.00. 

Rehabilitation of the physically handicapped. 
Henry H. Kessler. New York: Columbia 
University Press, 1953. Pp. 275. $4.00. 

Statistical methods in experimentation. No- 
lan C. Lacey. New York: Macmillan Co., 
1953. P. 249. 

The retarded reader in the junior high school. 
May Lazar, Editor. New York: Board of 
Education, Bureau of Educational Re- 
search, 1952. Pp. 126. 

The psychology of personal and socidl ad- 
justment, Henry Clay Lindgren. New 
York: American Book Company, 1953. 
Pp. 481. $4.50. 

Design and analysis of experiments in psy- 
chology and education. E. F. Lindquist. 
Boston: Houghton Mifflin Co., 1953. Pp. 
393. $6.50. 

In the minds of men. Gardner Murphy. 
New York: Basic Books, Inc., 1953. $4.50. 

Rorschach interpretation: advanced tech- 
nique. Leslie Phillips and Joseph G. Smith. 


New York: Grune and Stratton, Inc., 1953. _ 


Pp. 400. $8.75. 

Wait the withering rain. Austin L. Porter- 
field. Fort Worth: Leo Potishman Foun- 
dation, 1953. Pp. 147. $2.50. 

Social psychology: S. Stansfeld Sargent. 
New York: The Ronald Press Company, 
1953. Pp. 519. $4.50. 
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New York: Prentice-Hall, Inc., 1952. 
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An experiment in recreation with the men- 
tally retarded. Bertha E. Schlotter and 
Margaret Svendsen. Chicago: Illinois De- 
partment of Public Welfare, 1951. Pp. 
142. Gratis. 
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C. Townsend. New York: McGraw-Hill 
Book Co., Inc., 1953. $4.00. 

Modern educational problems. Arthur E. 
Traxler, Editor. Washington, D. C.: 
American Council on Education 1953. Pp. 
147. $1.50. 

Improving transition from school to college. 
Arthur E. Traxler and Agatha Townsend. 
New York: Harper & Brothers, 1953. Pp. 
165. $2.75. 

Personality tests and assessments. Philip E. 
Vernon. London: Methuen & Co. Ltd., 
1953. Pp. 220. 

The roots of psychotherapy. Carl A. Whi- 
taker and Thomas P. Malone. New York: 
The Blakiston Company, Inc., 1953. Pp. 
236. $4.50. 

The measured effectiveness of employee pub- 
lications. Association of National Adver- 
tisers, 1953. Pp. 109. 

Drug addiction among adolescents. Com- 
mittee on Public Health Relations of the 
New York Academy of Medicine. New 


York: The Blakiston Company, Inc., 1953. 
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By J. Stantey Gray, University of Georgia. McGraw-Hill Series in Psychology. 
581 pages, $6.00 


A revision of PSYCHOLOGY IN HUMAN AFFAIRS, this text remains essentially the 
same in purpose and tone, but is completely modernized. An introduction to applica- 
tion of psychology in more than twenty fields, each chapter presents uses and factual 
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fields. _ 
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With important stress upon each level of development as foundation for the next, this 
text covers the life span from conception to death with emphasis on outstanding char- 
acteristics in each major life period. Close correlation between mental and physical 
growth and methods of change in interests, attitudes, and behavior are discussed. In- 
cluded also is a review of major experimental studies. 
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and on all subjects. It is valuable also in developing performance tests of skill as well 
as written tests of knowledge and abilities. Principles and applications are offered with 
simplicity and intelligibility, and complex statistical methods are avoided. 
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By C. H. Lawsne, Purdue University. 350 pages, $5.50 


Presented in an unusual, non-technical, and practical manner, this book shows the non- 
psychologists, and even those who have never studied psychology, the things industrial 
psychology offers. Aimed specifically at supervisors and managers in industry, it also 
is useful to the college student studying management. 
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